KubernetesPodNotHealthy

Meaning

This alert is triggered when a Kubernetes pod has been in a non-running state (`Pending`, `Unknown`, or `Failed`) for more than 4 hours. It indicates that the pod is unhealthy and not serving its intended workload.

Impact

A pod remaining non-running for extended periods can cause:

Application downtime
Service degradation or unavailability
Failed deployments or incomplete updates
Potential cascading failures if other pods depend on it

This alert is critical, as prolonged pod unhealthiness directly affects applications.

Diagnosis

Check pod status:

kubectl get pod {{ $labels.pod }} -n {{ $labels.namespace }}
kubectl describe pod {{ $labels.pod }} -n {{ $labels.namespace }}

Check events for reasons of failure:

kubectl get events -n {{ $labels.namespace }} --sort-by=.lastTimestamp

Check logs for container errors:

kubectl logs {{ $labels.pod }} -n {{ $labels.namespace }} --all-containers

For multi-container pods, check individual container states:

kubectl get pod {{ $labels.pod }} -n {{ $labels.namespace }} -o json | jq '.status.containerStatuses'

Possible Causes

CrashLoopBackOff due to application errors
ImagePullBackOff or missing images
Insufficient resources on the node (CPU/memory/disk)
Pod scheduling failures due to node constraints
Configuration errors or misconfigured readiness/liveness probes

Mitigation

Investigate logs and restart the pod if appropriate:

kubectl delete pod {{ $labels.pod }} -n {{ $labels.namespace }}

Resolve resource constraints (increase node capacity, adjust limits/requests)
Fix configuration issues, container image problems, or dependency failures
Reschedule pods to healthy nodes using taints/tolerations or affinity rules

Escalation

Escalate if pod remains unhealthy after mitigation
Page on-call engineer if production services are impacted
Monitor related pods or services for cascading failures

Related Alerts

PodCrashLoopBackOff
PodPending
KubernetesNodeMemoryPressure
KubernetesNodeDiskPressure

Related Dashboards

Grafana → Kubernetes / Pods Overview
Grafana → Namespace Health

Table of Contents