runbooks:coustom_alerts:kubernetesnodenetworkunavailable
Table of Contents
runbooks:coustom_alerts:KubernetesNodeNetworkUnavailable
KubernetesNodeNetworkUnavailable
Meaning
This alert is triggered when a Kubernetes node reports the NetworkUnavailable condition for more than 2 minutes. It indicates that the node’s networking is not properly configured or unavailable, preventing pods from communicating.
Impact
NetworkUnavailable can cause:
- Pods on the node being unable to communicate with each other or external services
- Application downtime or degraded performance
- Cluster components (kubelet, kube-proxy) failing to manage pods
- Scheduling and service disruptions
This alert is critical, as networking issues directly affect node and application availability.
Diagnosis
Check node status:
kubectl get nodes kubectl describe node <NODE_NAME>
Check network plugin status (e.g., CNI pods):
kubectl get pods -n kube-system kubectl describe pod <CNI_POD_NAME> -n kube-system
Check kubelet logs for network errors:
journalctl -u kubelet -n 100
Check recent events for network-related issues:
kubectl get events --sort-by=.lastTimestamp
Verify node network interfaces and routes (if SSH access is available):
ip addr ip route
Possible Causes
- CNI plugin misconfiguration or failure
- Node network interface down or misconfigured
- Firewall or security group blocking traffic
- Cloud provider network issues
- Kubelet unable to configure networking due to errors
Mitigation
- Restart CNI plugin pods:
kubectl delete pod <CNI_POD_NAME> -n kube-system
- Restart kubelet service:
systemctl restart kubelet
- Verify network configuration and routes on the node
- Check firewall/security group rules
- If cloud provider issue, contact provider support
- If node cannot recover, cordon and drain it temporarily:
kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data kubectl uncordon <NODE_NAME>
Escalation
- If NetworkUnavailable persists beyond 10 minutes, escalate to the platform/network team
- Page on-call engineer if production workloads are impacted
- If multiple nodes are affected, treat as cluster-wide network incident
Related Alerts
- KubernetesNodeNotReady
- KubeletDown
- PodCrashLoopBackOff
- NodeDown
Related Dashboards
- Grafana → Kubernetes / Node Network
- Grafana → CNI Plugin Metrics
runbooks/coustom_alerts/kubernetesnodenetworkunavailable.txt · Last modified: by admin
