User Tools

Site Tools


runbooks:coustom_alerts:kubernetesnodememorypressure

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

runbooks:coustom_alerts:kubernetesnodememorypressure [2025/12/13 16:36] – created adminrunbooks:coustom_alerts:kubernetesnodememorypressure [2025/12/14 06:53] (current) admin
Line 1: Line 1:
 runbooks:coustom_alerts:KubernetesNodeMemoryPressure runbooks:coustom_alerts:KubernetesNodeMemoryPressure
 +
 +====== KubernetesNodeMemoryPressure ======
 +
 +===== Meaning =====
 +This alert is triggered when a Kubernetes node reports the **MemoryPressure** condition for more than 2 minutes.
 +MemoryPressure indicates that the node is running low on available memory and may start evicting pods.
 +
 +===== Impact =====
 +Memory pressure on a node can lead to:
 +  * Pod evictions and restarts
 +  * OOMKilled containers
 +  * Degraded application performance
 +  * Scheduling failures for new pods
 +
 +This alert is **critical** because sustained memory pressure directly affects workload stability.
 +
 +===== Diagnosis =====
 +Check node memory status:
 +
 +<code bash>
 +kubectl get nodes
 +kubectl describe node <NODE_NAME>
 +</code>
 +
 +Check node memory usage:
 +
 +<code bash>
 +kubectl top node <NODE_NAME>
 +free -m
 +</code>
 +
 +List pods consuming high memory:
 +
 +<code bash>
 +kubectl top pod --all-namespaces --sort-by=memory
 +</code>
 +
 +Check recent pod evictions:
 +
 +<code bash>
 +kubectl get events --sort-by=.lastTimestamp
 +</code>
 +
 +===== Possible Causes =====
 +  * Memory leaks in applications
 +  * Insufficient memory requests/limits
 +  * Sudden traffic spikes
 +  * Misconfigured workloads or batch jobs
 +  * Too many pods scheduled on the node
 +
 +===== Mitigation =====
 +  - Identify and restart or scale memory-heavy pods
 +  - Set proper resource **requests and limits**
 +  - Scale out workloads or add more nodes
 +  - Increase node memory capacity if required
 +
 +If immediate relief is needed, drain the node:
 +
 +<code bash>
 +kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data
 +</code>
 +
 +After mitigation and stabilization:
 +
 +<code bash>
 +kubectl uncordon <NODE_NAME>
 +</code>
 +
 +===== Escalation =====
 +  * Escalate if memory pressure persists longer than 10 minutes
 +  * Page on-call engineer if pod evictions impact production
 +  * If multiple nodes show memory pressure, treat as cluster capacity issue
 +
 +===== Related Alerts =====
 +  * HighMemoryUsage
 +  * PodCrashLoopBackOff
 +  * KubernetesNodeNotReady
 +  * HighCPUUsage
 +
 +===== Related Dashboards =====
 +  * Grafana → Kubernetes / Node Memory
 +  * Grafana → Node Exporter Full
 +
runbooks/coustom_alerts/kubernetesnodememorypressure.txt · Last modified: by admin