runbooks:coustom_alerts:KubernetesNodeMemoryPressure ====== KubernetesNodeMemoryPressure ====== ===== Meaning ===== This alert is triggered when a Kubernetes node reports the **MemoryPressure** condition for more than 2 minutes. MemoryPressure indicates that the node is running low on available memory and may start evicting pods. ===== Impact ===== Memory pressure on a node can lead to: * Pod evictions and restarts * OOMKilled containers * Degraded application performance * Scheduling failures for new pods This alert is **critical** because sustained memory pressure directly affects workload stability. ===== Diagnosis ===== Check node memory status: kubectl get nodes kubectl describe node Check node memory usage: kubectl top node free -m List pods consuming high memory: kubectl top pod --all-namespaces --sort-by=memory Check recent pod evictions: kubectl get events --sort-by=.lastTimestamp ===== Possible Causes ===== * Memory leaks in applications * Insufficient memory requests/limits * Sudden traffic spikes * Misconfigured workloads or batch jobs * Too many pods scheduled on the node ===== Mitigation ===== - Identify and restart or scale memory-heavy pods - Set proper resource **requests and limits** - Scale out workloads or add more nodes - Increase node memory capacity if required If immediate relief is needed, drain the node: kubectl drain --ignore-daemonsets --delete-emptydir-data After mitigation and stabilization: kubectl uncordon ===== Escalation ===== * Escalate if memory pressure persists longer than 10 minutes * Page on-call engineer if pod evictions impact production * If multiple nodes show memory pressure, treat as cluster capacity issue ===== Related Alerts ===== * HighMemoryUsage * PodCrashLoopBackOff * KubernetesNodeNotReady * HighCPUUsage ===== Related Dashboards ===== * Grafana → Kubernetes / Node Memory * Grafana → Node Exporter Full