Table of Contents

runbooks:coustom_alerts:KubeNodeNotReady

KubeNodeNotReady

Meaning

This alert is triggered when a Kubernetes node reports a `NotReady` status for more than 2 minutes. A node in `NotReady` state cannot reliably run or manage pods.

Impact

This alert indicates a node-level availability issue.

Possible impacts include:

This alert is a warning, but may become critical if the condition persists or affects multiple nodes.

Diagnosis

Check node status:

kubectl get nodes

Describe the affected node to inspect conditions and events:

kubectl describe node {{ $labels.node }}

Check recent node-related events:

kubectl get events --field-selector involvedObject.kind=Node

Verify kubelet health on the node (if SSH access is available):

systemctl status kubelet
journalctl -u kubelet --since "15 min ago"

Check node resource pressure:

kubectl describe node {{ $labels.node }} | grep -i pressure

Possible Causes

Mitigation

  1. Restart the kubelet service if it is not running
  2. Resolve disk, memory, or PID pressure conditions
  3. Restore network connectivity
  4. Reboot the node if required and safe
  5. If the node is unstable, drain it for investigation:
kubectl drain {{ $labels.node }} --ignore-daemonsets

After the node becomes healthy:

kubectl uncordon {{ $labels.node }}

Escalation