Table of Contents

Mattermost Backup CronJob

Meaning

These alerts monitor the Kubernetes Job created by the Mattermost backup CronJob.

Alerts:

Impact

* Success → Backup completed successfully. Mattermost data is safely stored in S3/MinIO. * Failure → Mattermost database may not be backed up. Could affect disaster recovery if restoration is needed.

Diagnosis

1. Check Kubernetes Job status:

kubectl get job mattermost-backup-job -n <NAMESPACE>
kubectl describe job mattermost-backup-job -n <NAMESPACE>

2. Check logs of the Job pod:

kubectl logs job/mattermost-backup-job -n <NAMESPACE>

3. Verify backup in S3/MinIO:

mc ls <MINIO_ALIAS>/mattermost-backups/
mc stat <MINIO_ALIAS>/mattermost-backups/<backup_file>

4. Check PVC mounts if used:

kubectl get pvc -n <NAMESPACE>
kubectl describe pvc <PVC_NAME> -n <NAMESPACE>

Possible Causes of Failure

* Pod in CrashLoopBackOff, OOMKilled, or Failed * PVC mount unavailable or insufficient space * Backup storage credentials missing or misconfigured * Network issues preventing upload to S3/MinIO * Disk space or permissions issues on the node * CronJob manifest misconfiguration * Database credentials invalid or inaccessible

Mitigation

1. Inspect Job pod logs to identify errors. 2. Verify S3/MinIO credentials and connectivity. 3. Check PVC status and node disk availability. 4. Verify database credentials and connectivity. 5. Retry backup manually if needed:

kubectl create job --from=cronjob/mattermost-backup-job mattermost-backup-job-manual -n <NAMESPACE>

6. Correct any misconfigurations in CronJob YAML, database, or backup storage policies. 7. Escalate to SRE or admin team if repeated failures occur.

Escalation

* Escalate if backups fail for more than one consecutive run. * Notify on-call engineer if production Mattermost data may not be recoverable.

* MattermostBackupSucceeded * MattermostBackupFailed * HostOutOfDiskSpace (node running backup Job) * KubernetesPodCrashLooping

* Kubernetes → Jobs & CronJobs (namespace: <NAMESPACE>) * Grafana → Backup Job status metrics * S3/MinIO → Backup object listings