User Tools

Site Tools


runbooks:coustom_alerts:kubernetesnodenotready

runbooks:coustom_alerts:KubernetesNodeNotReady

KubernetesNodeNotReady

Meaning

This alert is triggered when a Kubernetes node remains in the NotReady state for more than 10 minutes. A NotReady node cannot reliably run or manage pods.

Impact

A node in NotReady state can cause:

  • Pods being evicted or stuck in Pending state
  • Reduced cluster capacity
  • Application downtime if replicas are insufficient
  • Scheduling failures for new workloads

This alert is marked critical because prolonged node unavailability threatens cluster stability.

Diagnosis

Check node status:

kubectl get nodes

Inspect node conditions and events:

kubectl describe node <NODE_NAME>

Check recent cluster-wide events:

kubectl get events --sort-by=.lastTimestamp

Verify kubelet status on the node (if SSH access is available):

systemctl status kubelet
journalctl -u kubelet -n 100

Check system resource pressure:

kubectl top node <NODE_NAME>
df -h
free -m

Possible Causes

  • Kubelet service stopped or unhealthy
  • Node lost network connectivity
  • Disk, memory, or CPU pressure
  • Kernel panic or OS-level issues
  • Cloud provider instance failure or maintenance

Mitigation

  1. Restart kubelet service:
systemctl restart kubelet
  1. Resolve resource pressure (disk cleanup, memory leaks)
  2. Verify networking and DNS configuration
  3. Reboot the node if necessary
  4. If node cannot recover, drain and replace it

Drain node safely:

kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data

After recovery:

kubectl uncordon <NODE_NAME>

Escalation

  • If node remains NotReady after mitigation, escalate to the infrastructure team
  • If multiple nodes are affected, treat as a cluster-level incident
  • Page on-call engineer if production workloads are impacted
  • KubeletDown
  • NodeDown
  • HighDiskIOWait
  • HighCPUUsage
  • HighMemoryUsage
  • Grafana → Kubernetes / Nodes
  • Grafana → Node Exporter Full
runbooks/coustom_alerts/kubernetesnodenotready.txt · Last modified: by admin