runbooks:coustom_alerts:KubernetesNodeNetworkUnavailable ====== KubernetesNodeNetworkUnavailable ====== ===== Meaning ===== This alert is triggered when a Kubernetes node reports the **NetworkUnavailable** condition for more than 2 minutes. It indicates that the node’s networking is not properly configured or unavailable, preventing pods from communicating. ===== Impact ===== NetworkUnavailable can cause: * Pods on the node being unable to communicate with each other or external services * Application downtime or degraded performance * Cluster components (kubelet, kube-proxy) failing to manage pods * Scheduling and service disruptions This alert is **critical**, as networking issues directly affect node and application availability. ===== Diagnosis ===== Check node status: kubectl get nodes kubectl describe node Check network plugin status (e.g., CNI pods): kubectl get pods -n kube-system kubectl describe pod -n kube-system Check kubelet logs for network errors: journalctl -u kubelet -n 100 Check recent events for network-related issues: kubectl get events --sort-by=.lastTimestamp Verify node network interfaces and routes (if SSH access is available): ip addr ip route ===== Possible Causes ===== * CNI plugin misconfiguration or failure * Node network interface down or misconfigured * Firewall or security group blocking traffic * Cloud provider network issues * Kubelet unable to configure networking due to errors ===== Mitigation ===== - Restart CNI plugin pods: kubectl delete pod -n kube-system - Restart kubelet service: systemctl restart kubelet - Verify network configuration and routes on the node - Check firewall/security group rules - If cloud provider issue, contact provider support - If node cannot recover, cordon and drain it temporarily: kubectl drain --ignore-daemonsets --delete-emptydir-data kubectl uncordon ===== Escalation ===== * If NetworkUnavailable persists beyond 10 minutes, escalate to the platform/network team * Page on-call engineer if production workloads are impacted * If multiple nodes are affected, treat as cluster-wide network incident ===== Related Alerts ===== * KubernetesNodeNotReady * KubeletDown * PodCrashLoopBackOff * NodeDown ===== Related Dashboards ===== * Grafana → Kubernetes / Node Network * Grafana → CNI Plugin Metrics