runbooks:coustom_alerts:KubernetesNodeMemoryPressure
====== KubernetesNodeMemoryPressure ======
===== Meaning =====
This alert is triggered when a Kubernetes node reports the **MemoryPressure** condition for more than 2 minutes.
MemoryPressure indicates that the node is running low on available memory and may start evicting pods.
===== Impact =====
Memory pressure on a node can lead to:
* Pod evictions and restarts
* OOMKilled containers
* Degraded application performance
* Scheduling failures for new pods
This alert is **critical** because sustained memory pressure directly affects workload stability.
===== Diagnosis =====
Check node memory status:
kubectl get nodes
kubectl describe node
Check node memory usage:
kubectl top node
free -m
List pods consuming high memory:
kubectl top pod --all-namespaces --sort-by=memory
Check recent pod evictions:
kubectl get events --sort-by=.lastTimestamp
===== Possible Causes =====
* Memory leaks in applications
* Insufficient memory requests/limits
* Sudden traffic spikes
* Misconfigured workloads or batch jobs
* Too many pods scheduled on the node
===== Mitigation =====
- Identify and restart or scale memory-heavy pods
- Set proper resource **requests and limits**
- Scale out workloads or add more nodes
- Increase node memory capacity if required
If immediate relief is needed, drain the node:
kubectl drain --ignore-daemonsets --delete-emptydir-data
After mitigation and stabilization:
kubectl uncordon
===== Escalation =====
* Escalate if memory pressure persists longer than 10 minutes
* Page on-call engineer if pod evictions impact production
* If multiple nodes show memory pressure, treat as cluster capacity issue
===== Related Alerts =====
* HighMemoryUsage
* PodCrashLoopBackOff
* KubernetesNodeNotReady
* HighCPUUsage
===== Related Dashboards =====
* Grafana → Kubernetes / Node Memory
* Grafana → Node Exporter Full