These alerts monitor the Kubernetes Job created by the Mattermost backup CronJob.
Alerts:
* Success → Backup completed successfully. Mattermost data is safely stored in S3/MinIO. * Failure → Mattermost database may not be backed up. Could affect disaster recovery if restoration is needed.
1. Check Kubernetes Job status:
kubectl get job mattermost-backup-job -n <NAMESPACE> kubectl describe job mattermost-backup-job -n <NAMESPACE>
2. Check logs of the Job pod:
kubectl logs job/mattermost-backup-job -n <NAMESPACE>
3. Verify backup in S3/MinIO:
mc ls <MINIO_ALIAS>/mattermost-backups/ mc stat <MINIO_ALIAS>/mattermost-backups/<backup_file>
4. Check PVC mounts if used:
kubectl get pvc -n <NAMESPACE> kubectl describe pvc <PVC_NAME> -n <NAMESPACE>
* Pod in CrashLoopBackOff, OOMKilled, or Failed * PVC mount unavailable or insufficient space * Backup storage credentials missing or misconfigured * Network issues preventing upload to S3/MinIO * Disk space or permissions issues on the node * CronJob manifest misconfiguration * Database credentials invalid or inaccessible
1. Inspect Job pod logs to identify errors. 2. Verify S3/MinIO credentials and connectivity. 3. Check PVC status and node disk availability. 4. Verify database credentials and connectivity. 5. Retry backup manually if needed:
kubectl create job --from=cronjob/mattermost-backup-job mattermost-backup-job-manual -n <NAMESPACE>
6. Correct any misconfigurations in CronJob YAML, database, or backup storage policies. 7. Escalate to SRE or admin team if repeated failures occur.
* Escalate if backups fail for more than one consecutive run. * Notify on-call engineer if production Mattermost data may not be recoverable.
* MattermostBackupSucceeded * MattermostBackupFailed * HostOutOfDiskSpace (node running backup Job) * KubernetesPodCrashLooping
* Kubernetes → Jobs & CronJobs (namespace: <NAMESPACE>) * Grafana → Backup Job status metrics * S3/MinIO → Backup object listings