====== Serpbear Backup CronJob ====== ===== Meaning ===== These alerts monitor the Kubernetes Job created by the Serpbear backup CronJob. Alerts: * `SerpbearBackupSucceeded` → Backup Job succeeded; Serpbear PVC data was archived and uploaded to MinIO. * `SerpbearBackupFailed` → Backup Job failed; PVC data backup did not complete or failed to upload. ===== Impact ===== * Success → Backup completed successfully. Serpbear data is safely stored in MinIO. * Failure → Serpbear PVC data may not be backed up. Could affect disaster recovery if restoration is needed. ===== Diagnosis ===== 1. Check Kubernetes Job status: kubectl get job serpbear-backup-job -n kubectl describe job serpbear-backup-job -n 2. Check logs of the Job pod: kubectl logs job/serpbear-backup-job -n 3. Verify backup in MinIO: mc ls /serpbear-backups/ mc stat /serpbear-backups/ 4. Check PVC mounts: kubectl get pvc -n kubectl describe pvc -n ===== Possible Causes of Failure ===== * Pod in CrashLoopBackOff, OOMKilled, or Failed * PVC mount unavailable or insufficient space * MinIO credentials missing or misconfigured * Network issues preventing upload to MinIO * Disk space or permissions issues on the node * CronJob manifest misconfiguration ===== Mitigation ===== 1. Inspect Job pod logs to identify errors. 2. Verify MinIO credentials and connectivity. 3. Check PVC status and node disk availability. 4. Retry backup manually if needed: kubectl create job --from=cronjob/serpbear-backup-job serpbear-backup-job-manual -n 5. Correct any misconfigurations in CronJob YAML, PVC, or MinIO bucket policy. 6. Escalate to SRE or admin team if repeated failures occur. ===== Escalation ===== * Escalate if backups fail for more than one consecutive run. * Notify on-call engineer if Serpbear data may not be recoverable. ===== Related Alerts ===== * SerpbearBackupSucceeded * SerpbearBackupFailed * HostOutOfDiskSpace (node running backup Job) * KubernetesPodCrashLooping ===== Related Dashboards ===== * Kubernetes → Jobs & CronJobs (namespace: ) * Grafana → Backup Job status metrics * MinIO → Backup object listings