runbooks:coustom_alerts:KubernetesNodeOutOfPodCapacity

====== KubernetesNodeOutOfPodCapacity ======

===== Meaning =====
This alert is triggered when a Kubernetes node reaches **more than 90% of its pod capacity** for more than 2 minutes.
It indicates that the node has almost no free allocatable pod slots left.

===== Impact =====
A node running out of pod capacity can cause:
  * New pods failing to schedule on the node
  * Workload imbalance across the cluster
  * Potential service degradation if no other nodes are available
  * Increased latency for scheduling or scaling operations

This alert is marked **warning**, as it may precede node-level failures or application disruptions.

===== Diagnosis =====
Check node pod allocation:

<code bash>
kubectl get nodes -o wide
kubectl describe node <NODE_NAME>
</code>

Check running pods on the node:

<code bash>
kubectl get pods -o wide --all-namespaces | grep <NODE_NAME>
</code>

Check node allocatable pods:

<code bash>
kubectl get node <NODE_NAME> -o jsonpath='{.status.allocatable.pods}'
</code>

Check cluster-wide pod distribution:

<code bash>
kubectl get pods --all-namespaces -o wide
</code>

===== Possible Causes =====
  * Node is heavily loaded with many pods
  * Misconfigured deployments with too many replicas on a single node
  * DaemonSets consuming pod slots
  * Cluster autoscaler not configured or failing
  * Pod anti-affinity rules forcing pods onto fewer nodes

===== Mitigation =====
  - Review and redistribute workloads across nodes
  - Scale out the cluster by adding more nodes
  - Remove unnecessary pods or workloads from the node
  - Adjust DaemonSets or affinity/anti-affinity rules
  - Enable or tune Cluster Autoscaler if available

===== Escalation =====
  * Escalate if multiple nodes are reaching pod capacity
  * Page on-call engineer if workloads fail to schedule and impact production
  * Monitor cluster autoscaler or take manual action to add nodes

===== Related Alerts =====
  * KubernetesNodeMemoryPressure
  * KubernetesNodeDiskPressure
  * KubernetesNodeNotReady
  * PodPending

===== Related Dashboards =====
  * Grafana → Kubernetes / Node Pod Capacity
  * Grafana → Cluster Pod Distribution