KubeNodeNotReady

Meaning

This alert is triggered when a Kubernetes node reports a `NotReady` status for more than 2 minutes. A node in `NotReady` state cannot reliably run or manage pods.

Impact

This alert indicates a node-level availability issue.

Possible impacts include:

Pods on the node may be evicted or rescheduled
Reduced cluster capacity
Increased load on remaining nodes
Application performance degradation or partial outages

This alert is a warning, but may become critical if the condition persists or affects multiple nodes.

Diagnosis

Check node status:

kubectl get nodes

Describe the affected node to inspect conditions and events:

kubectl describe node {{ $labels.node }}

Check recent node-related events:

kubectl get events --field-selector involvedObject.kind=Node

Verify kubelet health on the node (if SSH access is available):

systemctl status kubelet
journalctl -u kubelet --since "15 min ago"

Check node resource pressure:

kubectl describe node {{ $labels.node }} | grep -i pressure

Possible Causes

Kubelet process stopped or unhealthy
Network connectivity issues
Disk, memory, or PID pressure on the node
Node reboot or hardware failure
Cloud provider instance issue

Mitigation

Restart the kubelet service if it is not running
Resolve disk, memory, or PID pressure conditions
Restore network connectivity
Reboot the node if required and safe
If the node is unstable, drain it for investigation:

kubectl drain {{ $labels.node }} --ignore-daemonsets

After the node becomes healthy:

kubectl uncordon {{ $labels.node }}

Escalation

If the node remains NotReady for more than 10 minutes, escalate to the platform team
If multiple nodes are affected, treat as a cluster-level incident
If production workloads are impacted, page the on-call engineer

Related Alerts

NodeDown
KubeletDown
HighDiskUsage
HighMemoryUsage

Related Dashboards

Grafana → Kubernetes / Nodes
Grafana → Node Health Overview

Table of Contents