User Tools

Site Tools


runbooks:coustom_alerts:kubernetesnodeoutofpodcapacity

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

runbooks:coustom_alerts:kubernetesnodeoutofpodcapacity [2025/12/13 16:37] – created adminrunbooks:coustom_alerts:kubernetesnodeoutofpodcapacity [2025/12/14 06:57] (current) admin
Line 1: Line 1:
 runbooks:coustom_alerts:KubernetesNodeOutOfPodCapacity runbooks:coustom_alerts:KubernetesNodeOutOfPodCapacity
 +
 +====== KubernetesNodeOutOfPodCapacity ======
 +
 +===== Meaning =====
 +This alert is triggered when a Kubernetes node reaches **more than 90% of its pod capacity** for more than 2 minutes.
 +It indicates that the node has almost no free allocatable pod slots left.
 +
 +===== Impact =====
 +A node running out of pod capacity can cause:
 +  * New pods failing to schedule on the node
 +  * Workload imbalance across the cluster
 +  * Potential service degradation if no other nodes are available
 +  * Increased latency for scheduling or scaling operations
 +
 +This alert is marked **warning**, as it may precede node-level failures or application disruptions.
 +
 +===== Diagnosis =====
 +Check node pod allocation:
 +
 +<code bash>
 +kubectl get nodes -o wide
 +kubectl describe node <NODE_NAME>
 +</code>
 +
 +Check running pods on the node:
 +
 +<code bash>
 +kubectl get pods -o wide --all-namespaces | grep <NODE_NAME>
 +</code>
 +
 +Check node allocatable pods:
 +
 +<code bash>
 +kubectl get node <NODE_NAME> -o jsonpath='{.status.allocatable.pods}'
 +</code>
 +
 +Check cluster-wide pod distribution:
 +
 +<code bash>
 +kubectl get pods --all-namespaces -o wide
 +</code>
 +
 +===== Possible Causes =====
 +  * Node is heavily loaded with many pods
 +  * Misconfigured deployments with too many replicas on a single node
 +  * DaemonSets consuming pod slots
 +  * Cluster autoscaler not configured or failing
 +  * Pod anti-affinity rules forcing pods onto fewer nodes
 +
 +===== Mitigation =====
 +  - Review and redistribute workloads across nodes
 +  - Scale out the cluster by adding more nodes
 +  - Remove unnecessary pods or workloads from the node
 +  - Adjust DaemonSets or affinity/anti-affinity rules
 +  - Enable or tune Cluster Autoscaler if available
 +
 +===== Escalation =====
 +  * Escalate if multiple nodes are reaching pod capacity
 +  * Page on-call engineer if workloads fail to schedule and impact production
 +  * Monitor cluster autoscaler or take manual action to add nodes
 +
 +===== Related Alerts =====
 +  * KubernetesNodeMemoryPressure
 +  * KubernetesNodeDiskPressure
 +  * KubernetesNodeNotReady
 +  * PodPending
 +
 +===== Related Dashboards =====
 +  * Grafana → Kubernetes / Node Pod Capacity
 +  * Grafana → Cluster Pod Distribution
 +
runbooks/coustom_alerts/kubernetesnodeoutofpodcapacity.txt · Last modified: by admin