runbooks:coustom_alerts:KubernetesNodeNotReady
This alert is triggered when a Kubernetes node remains in the NotReady state for more than 10 minutes. A NotReady node cannot reliably run or manage pods.
A node in NotReady state can cause:
This alert is marked critical because prolonged node unavailability threatens cluster stability.
Check node status:
kubectl get nodes
Inspect node conditions and events:
kubectl describe node <NODE_NAME>
Check recent cluster-wide events:
kubectl get events --sort-by=.lastTimestamp
Verify kubelet status on the node (if SSH access is available):
systemctl status kubelet journalctl -u kubelet -n 100
Check system resource pressure:
kubectl top node <NODE_NAME> df -h free -m
systemctl restart kubelet
Drain node safely:
kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data
After recovery:
kubectl uncordon <NODE_NAME>