User Tools

Site Tools


runbooks:coustom_alerts:kubernetesnodenetworkunavailable

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

runbooks:coustom_alerts:kubernetesnodenetworkunavailable [2025/12/13 16:36] – created adminrunbooks:coustom_alerts:kubernetesnodenetworkunavailable [2025/12/14 06:56] (current) admin
Line 1: Line 1:
 runbooks:coustom_alerts:KubernetesNodeNetworkUnavailable runbooks:coustom_alerts:KubernetesNodeNetworkUnavailable
 +
 +====== KubernetesNodeNetworkUnavailable ======
 +
 +===== Meaning =====
 +This alert is triggered when a Kubernetes node reports the **NetworkUnavailable** condition for more than 2 minutes.
 +It indicates that the node’s networking is not properly configured or unavailable, preventing pods from communicating.
 +
 +===== Impact =====
 +NetworkUnavailable can cause:
 +  * Pods on the node being unable to communicate with each other or external services
 +  * Application downtime or degraded performance
 +  * Cluster components (kubelet, kube-proxy) failing to manage pods
 +  * Scheduling and service disruptions
 +
 +This alert is **critical**, as networking issues directly affect node and application availability.
 +
 +===== Diagnosis =====
 +Check node status:
 +
 +<code bash>
 +kubectl get nodes
 +kubectl describe node <NODE_NAME>
 +</code>
 +
 +Check network plugin status (e.g., CNI pods):
 +
 +<code bash>
 +kubectl get pods -n kube-system
 +kubectl describe pod <CNI_POD_NAME> -n kube-system
 +</code>
 +
 +Check kubelet logs for network errors:
 +
 +<code bash>
 +journalctl -u kubelet -n 100
 +</code>
 +
 +Check recent events for network-related issues:
 +
 +<code bash>
 +kubectl get events --sort-by=.lastTimestamp
 +</code>
 +
 +Verify node network interfaces and routes (if SSH access is available):
 +
 +<code bash>
 +ip addr
 +ip route
 +</code>
 +
 +===== Possible Causes =====
 +  * CNI plugin misconfiguration or failure
 +  * Node network interface down or misconfigured
 +  * Firewall or security group blocking traffic
 +  * Cloud provider network issues
 +  * Kubelet unable to configure networking due to errors
 +
 +===== Mitigation =====
 +  - Restart CNI plugin pods:
 +
 +<code bash>
 +kubectl delete pod <CNI_POD_NAME> -n kube-system
 +</code>
 +
 +  - Restart kubelet service:
 +
 +<code bash>
 +systemctl restart kubelet
 +</code>
 +
 +  - Verify network configuration and routes on the node
 +  - Check firewall/security group rules
 +  - If cloud provider issue, contact provider support
 +  - If node cannot recover, cordon and drain it temporarily:
 +
 +<code bash>
 +kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data
 +kubectl uncordon <NODE_NAME>
 +</code>
 +
 +===== Escalation =====
 +  * If NetworkUnavailable persists beyond 10 minutes, escalate to the platform/network team
 +  * Page on-call engineer if production workloads are impacted
 +  * If multiple nodes are affected, treat as cluster-wide network incident
 +
 +===== Related Alerts =====
 +  * KubernetesNodeNotReady
 +  * KubeletDown
 +  * PodCrashLoopBackOff
 +  * NodeDown
 +
 +===== Related Dashboards =====
 +  * Grafana → Kubernetes / Node Network
 +  * Grafana → CNI Plugin Metrics
 +
runbooks/coustom_alerts/kubernetesnodenetworkunavailable.txt · Last modified: by admin