runbooks:coustom_alerts:KubernetesNodeOutOfPodCapacity
====== KubernetesNodeOutOfPodCapacity ======
===== Meaning =====
This alert is triggered when a Kubernetes node reaches **more than 90% of its pod capacity** for more than 2 minutes.
It indicates that the node has almost no free allocatable pod slots left.
===== Impact =====
A node running out of pod capacity can cause:
* New pods failing to schedule on the node
* Workload imbalance across the cluster
* Potential service degradation if no other nodes are available
* Increased latency for scheduling or scaling operations
This alert is marked **warning**, as it may precede node-level failures or application disruptions.
===== Diagnosis =====
Check node pod allocation:
kubectl get nodes -o wide
kubectl describe node
Check running pods on the node:
kubectl get pods -o wide --all-namespaces | grep
Check node allocatable pods:
kubectl get node -o jsonpath='{.status.allocatable.pods}'
Check cluster-wide pod distribution:
kubectl get pods --all-namespaces -o wide
===== Possible Causes =====
* Node is heavily loaded with many pods
* Misconfigured deployments with too many replicas on a single node
* DaemonSets consuming pod slots
* Cluster autoscaler not configured or failing
* Pod anti-affinity rules forcing pods onto fewer nodes
===== Mitigation =====
- Review and redistribute workloads across nodes
- Scale out the cluster by adding more nodes
- Remove unnecessary pods or workloads from the node
- Adjust DaemonSets or affinity/anti-affinity rules
- Enable or tune Cluster Autoscaler if available
===== Escalation =====
* Escalate if multiple nodes are reaching pod capacity
* Page on-call engineer if workloads fail to schedule and impact production
* Monitor cluster autoscaler or take manual action to add nodes
===== Related Alerts =====
* KubernetesNodeMemoryPressure
* KubernetesNodeDiskPressure
* KubernetesNodeNotReady
* PodPending
===== Related Dashboards =====
* Grafana → Kubernetes / Node Pod Capacity
* Grafana → Cluster Pod Distribution