runbooks:coustom_alerts:kubernetespodcrashlooping
Table of Contents
runbooks:coustom_alerts:KubernetesPodCrashLooping
KubernetesPodCrashLooping
Meaning
This alert is triggered when a Kubernetes pod has restarted more than 10 times in the last 6 hours. It indicates that the pod is crash looping and unable to run stably.
Impact
Crash looping pods can cause:
- Service degradation or unavailability
- Increased load on the node due to repeated restarts
- Potential cascading failures if other pods or services depend on it
- Deployment or update instability
This alert is warning, but can become critical if it affects production workloads or multiple pods.
Diagnosis
Check pod status:
kubectl get pod {{ $labels.pod }} -n {{ $labels.namespace }} kubectl describe pod {{ $labels.pod }} -n {{ $labels.namespace }}
Check container restart count:
kubectl get pod {{ $labels.pod }} -n {{ $labels.namespace }} -o jsonpath='{.status.containerStatuses[*].restartCount}'
Inspect pod logs to identify the root cause:
kubectl logs {{ $labels.pod }} -n {{ $labels.namespace }} --previous kubectl logs {{ $labels.pod }} -n {{ $labels.namespace }} --all-containers
Check events for errors:
kubectl get events -n {{ $labels.namespace }} --sort-by=.lastTimestamp
Possible Causes
- Application crashes due to bugs or misconfiguration
- Memory or CPU resource exhaustion (OOMKilled)
- Missing or incompatible dependencies
- Failed readiness/liveness probes causing restarts
- Misconfigured environment variables or secrets
Mitigation
- Investigate logs and fix the root cause
- Adjust pod resource requests and limits
- Verify container image integrity
- Restart the pod after applying fixes:
kubectl delete pod {{ $labels.pod }} -n {{ $labels.namespace }}
- Update deployments or StatefulSets if configuration errors exist
Escalation
- Escalate if crash looping persists after mitigation
- Page on-call engineer if production workloads are impacted
- Monitor other pods in the same namespace for related issues
Related Alerts
- KubernetesPodNotHealthy
- PodOOMKilled
- PodPending
- KubernetesNodeMemoryPressure
Related Dashboards
- Grafana → Kubernetes / Pod Restarts
- Grafana → Namespace Health Overview
runbooks/coustom_alerts/kubernetespodcrashlooping.txt · Last modified: by admin
