User Tools

Site Tools


runbooks:backup:calcom

Cal.com PostgreSQL Backup CronJob

Meaning

These alerts monitor the Kubernetes Job created by the Cal.com PostgreSQL backup CronJob.

Alerts:

  • `CalcomPostgresqlDBCronJobBackupSucceeded` → Backup Job succeeded; backup has been created and uploaded successfully to MinIO.
  • `CalcomPostgresqlDBCronJobBackupFailed` → Backup Job failed; backup was not created or failed to upload.

Impact

* Success → Backup completed, data is safely stored in MinIO. No action required. * Failure → Database backup may be missing, affecting recovery point objectives (RPO). Immediate investigation is required.

Diagnosis

1. Check Kubernetes Job status:

kubectl get job calcom-postgresql-backup-job -n calcom
kubectl describe job calcom-postgresql-backup-job -n calcom

2. Check logs of the Job pod:

kubectl logs job/calcom-postgresql-backup-job -n calcom

3. Verify backup in MinIO:

# Using mc (MinIO client)
mc ls <MINIO_ALIAS>/calcom-backups/
mc stat <MINIO_ALIAS>/calcom-backups/<backup_file>

4. Check for recent failures or retries:

kubectl get jobs -n calcom --sort-by=.status.startTime

Possible Causes of Failure

* Pod in CrashLoopBackOff or OOMKilled * MinIO credentials missing or misconfigured * Network issues between Kubernetes and MinIO * Insufficient permissions to write to MinIO bucket * Disk space issues on the node running the Job * CronJob manifest misconfiguration

Mitigation

1. Inspect Job pod logs to identify the error. 2. Ensure MinIO credentials are valid and accessible from the pod. 3. Check Kubernetes node resources (CPU, memory, disk) where the Job ran. 4. Retry the backup manually if needed:

kubectl create job --from=cronjob/calcom-pgdb-backup calcom-postgresql-backup-job-manual -n calcom

5. Correct any misconfiguration in the CronJob YAML or MinIO bucket policy. 6. Escalate to the SRE or DBA team if the failure impacts production backups.

Escalation

* Escalate if consecutive backup Jobs fail. * Page on-call engineer if production database backups are missing.

* CalcomPostgresqlDBCronJobBackupSucceeded * CalcomPostgresqlDBCronJobBackupFailed * KubernetesPodCrashLooping * HostOutOfDiskSpace (node running backup Job)

* Kubernetes → Jobs & CronJobs (namespace: calcom) * Grafana → Backup Job status metrics * MinIO → Backup object listings

runbooks/backup/calcom.txt · Last modified: by admin