====== Serpbear Backup CronJob ======
===== Meaning =====
These alerts monitor the Kubernetes Job created by the Serpbear backup CronJob.
Alerts:
* `SerpbearBackupSucceeded` → Backup Job succeeded; Serpbear PVC data was archived and uploaded to MinIO.
* `SerpbearBackupFailed` → Backup Job failed; PVC data backup did not complete or failed to upload.
===== Impact =====
* Success → Backup completed successfully. Serpbear data is safely stored in MinIO.
* Failure → Serpbear PVC data may not be backed up. Could affect disaster recovery if restoration is needed.
===== Diagnosis =====
1. Check Kubernetes Job status:
kubectl get job serpbear-backup-job -n
kubectl describe job serpbear-backup-job -n
2. Check logs of the Job pod:
kubectl logs job/serpbear-backup-job -n
3. Verify backup in MinIO:
mc ls /serpbear-backups/
mc stat /serpbear-backups/
4. Check PVC mounts:
kubectl get pvc -n
kubectl describe pvc -n
===== Possible Causes of Failure =====
* Pod in CrashLoopBackOff, OOMKilled, or Failed
* PVC mount unavailable or insufficient space
* MinIO credentials missing or misconfigured
* Network issues preventing upload to MinIO
* Disk space or permissions issues on the node
* CronJob manifest misconfiguration
===== Mitigation =====
1. Inspect Job pod logs to identify errors.
2. Verify MinIO credentials and connectivity.
3. Check PVC status and node disk availability.
4. Retry backup manually if needed:
kubectl create job --from=cronjob/serpbear-backup-job serpbear-backup-job-manual -n
5. Correct any misconfigurations in CronJob YAML, PVC, or MinIO bucket policy.
6. Escalate to SRE or admin team if repeated failures occur.
===== Escalation =====
* Escalate if backups fail for more than one consecutive run.
* Notify on-call engineer if Serpbear data may not be recoverable.
===== Related Alerts =====
* SerpbearBackupSucceeded
* SerpbearBackupFailed
* HostOutOfDiskSpace (node running backup Job)
* KubernetesPodCrashLooping
===== Related Dashboards =====
* Kubernetes → Jobs & CronJobs (namespace: )
* Grafana → Backup Job status metrics
* MinIO → Backup object listings