User Tools

Site Tools


runbooks:coustom_alerts:kubenodenotready

runbooks:coustom_alerts:KubeNodeNotReady

KubeNodeNotReady

Meaning

This alert is triggered when a Kubernetes node reports a `NotReady` status for more than 2 minutes. A node in `NotReady` state cannot reliably run or manage pods.

Impact

This alert indicates a node-level availability issue.

Possible impacts include:

  • Pods on the node may be evicted or rescheduled
  • Reduced cluster capacity
  • Increased load on remaining nodes
  • Application performance degradation or partial outages

This alert is a warning, but may become critical if the condition persists or affects multiple nodes.

Diagnosis

Check node status:

kubectl get nodes

Describe the affected node to inspect conditions and events:

kubectl describe node {{ $labels.node }}

Check recent node-related events:

kubectl get events --field-selector involvedObject.kind=Node

Verify kubelet health on the node (if SSH access is available):

systemctl status kubelet
journalctl -u kubelet --since "15 min ago"

Check node resource pressure:

kubectl describe node {{ $labels.node }} | grep -i pressure

Possible Causes

  • Kubelet process stopped or unhealthy
  • Network connectivity issues
  • Disk, memory, or PID pressure on the node
  • Node reboot or hardware failure
  • Cloud provider instance issue

Mitigation

  1. Restart the kubelet service if it is not running
  2. Resolve disk, memory, or PID pressure conditions
  3. Restore network connectivity
  4. Reboot the node if required and safe
  5. If the node is unstable, drain it for investigation:
kubectl drain {{ $labels.node }} --ignore-daemonsets

After the node becomes healthy:

kubectl uncordon {{ $labels.node }}

Escalation

  • If the node remains NotReady for more than 10 minutes, escalate to the platform team
  • If multiple nodes are affected, treat as a cluster-level incident
  • If production workloads are impacted, page the on-call engineer
  • NodeDown
  • KubeletDown
  • HighDiskUsage
  • HighMemoryUsage
  • Grafana → Kubernetes / Nodes
  • Grafana → Node Health Overview
runbooks/coustom_alerts/kubenodenotready.txt · Last modified: by admin