runbooks:coustom_alerts:KubernetesPodNotHealthy
This alert is triggered when a Kubernetes pod has been in a non-running state (`Pending`, `Unknown`, or `Failed`) for more than 4 hours. It indicates that the pod is unhealthy and not serving its intended workload.
A pod remaining non-running for extended periods can cause:
This alert is critical, as prolonged pod unhealthiness directly affects applications.
Check pod status:
kubectl get pod {{ $labels.pod }} -n {{ $labels.namespace }} kubectl describe pod {{ $labels.pod }} -n {{ $labels.namespace }}
Check events for reasons of failure:
kubectl get events -n {{ $labels.namespace }} --sort-by=.lastTimestamp
Check logs for container errors:
kubectl logs {{ $labels.pod }} -n {{ $labels.namespace }} --all-containers
For multi-container pods, check individual container states:
kubectl get pod {{ $labels.pod }} -n {{ $labels.namespace }} -o json | jq '.status.containerStatuses'
kubectl delete pod {{ $labels.pod }} -n {{ $labels.namespace }}