runbooks:coustom_alerts:kubernetesnodememorypressure
Differences
This shows you the differences between two versions of the page.
| runbooks:coustom_alerts:kubernetesnodememorypressure [2025/12/13 16:36] – created admin | runbooks:coustom_alerts:kubernetesnodememorypressure [2025/12/14 06:53] (current) – admin | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| runbooks: | runbooks: | ||
| + | |||
| + | ====== KubernetesNodeMemoryPressure ====== | ||
| + | |||
| + | ===== Meaning ===== | ||
| + | This alert is triggered when a Kubernetes node reports the **MemoryPressure** condition for more than 2 minutes. | ||
| + | MemoryPressure indicates that the node is running low on available memory and may start evicting pods. | ||
| + | |||
| + | ===== Impact ===== | ||
| + | Memory pressure on a node can lead to: | ||
| + | * Pod evictions and restarts | ||
| + | * OOMKilled containers | ||
| + | * Degraded application performance | ||
| + | * Scheduling failures for new pods | ||
| + | |||
| + | This alert is **critical** because sustained memory pressure directly affects workload stability. | ||
| + | |||
| + | ===== Diagnosis ===== | ||
| + | Check node memory status: | ||
| + | |||
| + | <code bash> | ||
| + | kubectl get nodes | ||
| + | kubectl describe node < | ||
| + | </ | ||
| + | |||
| + | Check node memory usage: | ||
| + | |||
| + | <code bash> | ||
| + | kubectl top node < | ||
| + | free -m | ||
| + | </ | ||
| + | |||
| + | List pods consuming high memory: | ||
| + | |||
| + | <code bash> | ||
| + | kubectl top pod --all-namespaces --sort-by=memory | ||
| + | </ | ||
| + | |||
| + | Check recent pod evictions: | ||
| + | |||
| + | <code bash> | ||
| + | kubectl get events --sort-by=.lastTimestamp | ||
| + | </ | ||
| + | |||
| + | ===== Possible Causes ===== | ||
| + | * Memory leaks in applications | ||
| + | * Insufficient memory requests/ | ||
| + | * Sudden traffic spikes | ||
| + | * Misconfigured workloads or batch jobs | ||
| + | * Too many pods scheduled on the node | ||
| + | |||
| + | ===== Mitigation ===== | ||
| + | - Identify and restart or scale memory-heavy pods | ||
| + | - Set proper resource **requests and limits** | ||
| + | - Scale out workloads or add more nodes | ||
| + | - Increase node memory capacity if required | ||
| + | |||
| + | If immediate relief is needed, drain the node: | ||
| + | |||
| + | <code bash> | ||
| + | kubectl drain < | ||
| + | </ | ||
| + | |||
| + | After mitigation and stabilization: | ||
| + | |||
| + | <code bash> | ||
| + | kubectl uncordon < | ||
| + | </ | ||
| + | |||
| + | ===== Escalation ===== | ||
| + | * Escalate if memory pressure persists longer than 10 minutes | ||
| + | * Page on-call engineer if pod evictions impact production | ||
| + | * If multiple nodes show memory pressure, treat as cluster capacity issue | ||
| + | |||
| + | ===== Related Alerts ===== | ||
| + | * HighMemoryUsage | ||
| + | * PodCrashLoopBackOff | ||
| + | * KubernetesNodeNotReady | ||
| + | * HighCPUUsage | ||
| + | |||
| + | ===== Related Dashboards ===== | ||
| + | * Grafana → Kubernetes / Node Memory | ||
| + | * Grafana → Node Exporter Full | ||
| + | |||
runbooks/coustom_alerts/kubernetesnodememorypressure.txt · Last modified: by admin
