runbooks:coustom_alerts:KubernetesNodeOutOfPodCapacity ====== KubernetesNodeOutOfPodCapacity ====== ===== Meaning ===== This alert is triggered when a Kubernetes node reaches **more than 90% of its pod capacity** for more than 2 minutes. It indicates that the node has almost no free allocatable pod slots left. ===== Impact ===== A node running out of pod capacity can cause: * New pods failing to schedule on the node * Workload imbalance across the cluster * Potential service degradation if no other nodes are available * Increased latency for scheduling or scaling operations This alert is marked **warning**, as it may precede node-level failures or application disruptions. ===== Diagnosis ===== Check node pod allocation: kubectl get nodes -o wide kubectl describe node Check running pods on the node: kubectl get pods -o wide --all-namespaces | grep Check node allocatable pods: kubectl get node -o jsonpath='{.status.allocatable.pods}' Check cluster-wide pod distribution: kubectl get pods --all-namespaces -o wide ===== Possible Causes ===== * Node is heavily loaded with many pods * Misconfigured deployments with too many replicas on a single node * DaemonSets consuming pod slots * Cluster autoscaler not configured or failing * Pod anti-affinity rules forcing pods onto fewer nodes ===== Mitigation ===== - Review and redistribute workloads across nodes - Scale out the cluster by adding more nodes - Remove unnecessary pods or workloads from the node - Adjust DaemonSets or affinity/anti-affinity rules - Enable or tune Cluster Autoscaler if available ===== Escalation ===== * Escalate if multiple nodes are reaching pod capacity * Page on-call engineer if workloads fail to schedule and impact production * Monitor cluster autoscaler or take manual action to add nodes ===== Related Alerts ===== * KubernetesNodeMemoryPressure * KubernetesNodeDiskPressure * KubernetesNodeNotReady * PodPending ===== Related Dashboards ===== * Grafana → Kubernetes / Node Pod Capacity * Grafana → Cluster Pod Distribution