User Tools

Site Tools


runbooks:coustom_alerts:kubernetesnodememorypressure

runbooks:coustom_alerts:KubernetesNodeMemoryPressure

KubernetesNodeMemoryPressure

Meaning

This alert is triggered when a Kubernetes node reports the MemoryPressure condition for more than 2 minutes. MemoryPressure indicates that the node is running low on available memory and may start evicting pods.

Impact

Memory pressure on a node can lead to:

  • Pod evictions and restarts
  • OOMKilled containers
  • Degraded application performance
  • Scheduling failures for new pods

This alert is critical because sustained memory pressure directly affects workload stability.

Diagnosis

Check node memory status:

kubectl get nodes
kubectl describe node <NODE_NAME>

Check node memory usage:

kubectl top node <NODE_NAME>
free -m

List pods consuming high memory:

kubectl top pod --all-namespaces --sort-by=memory

Check recent pod evictions:

kubectl get events --sort-by=.lastTimestamp

Possible Causes

  • Memory leaks in applications
  • Insufficient memory requests/limits
  • Sudden traffic spikes
  • Misconfigured workloads or batch jobs
  • Too many pods scheduled on the node

Mitigation

  1. Identify and restart or scale memory-heavy pods
  2. Set proper resource requests and limits
  3. Scale out workloads or add more nodes
  4. Increase node memory capacity if required

If immediate relief is needed, drain the node:

kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data

After mitigation and stabilization:

kubectl uncordon <NODE_NAME>

Escalation

  • Escalate if memory pressure persists longer than 10 minutes
  • Page on-call engineer if pod evictions impact production
  • If multiple nodes show memory pressure, treat as cluster capacity issue
  • HighMemoryUsage
  • PodCrashLoopBackOff
  • KubernetesNodeNotReady
  • HighCPUUsage
  • Grafana → Kubernetes / Node Memory
  • Grafana → Node Exporter Full
runbooks/coustom_alerts/kubernetesnodememorypressure.txt · Last modified: by admin