User Tools

Site Tools


runbooks:coustom_alerts:highmemoryusage

runbooks:coustom_alerts:HighMemoryUsage

HighMemoryUsage

Meaning

This alert is triggered when memory usage on a node exceeds 90% for more than 5 minutes. Memory usage is calculated based on total memory and available memory reported by node-exporter.

Impact

High memory usage can significantly affect node and application stability.

Possible impacts include:

  • Pod evictions due to memory pressure
  • Application crashes (OOMKilled)
  • Increased latency and degraded performance
  • Node becoming unresponsive under sustained pressure

This alert is a warning, but may escalate to a critical issue if not addressed.

Diagnosis

Check memory usage across nodes:

kubectl top nodes

Identify top memory-consuming pods:

kubectl top pods -A --sort-by=memory

Check node conditions for memory pressure:

kubectl describe node <NODE_NAME>

Look for recent memory-related events:

kubectl get events --field-selector involvedObject.kind=Node

If SSH access is available, inspect memory usage directly:

free -h
top
vmstat 1

Check for pods being OOM-killed:

kubectl get pods -A | grep OOMKilled

Possible Causes

  • Memory leak in an application
  • Pods without memory limits
  • Sudden increase in workload
  • Insufficient node memory capacity
  • Cache growth not properly controlled

Mitigation

  1. Identify and restart leaking or misbehaving pods if safe
  2. Set or adjust memory requests and limits for workloads
  3. Scale the application or add more nodes if required
  4. Evict non-critical workloads if needed
  5. Investigate and fix memory leaks in application code

If the node is under sustained pressure, drain it temporarily:

kubectl drain <NODE_NAME> --ignore-daemonsets

After recovery:

kubectl uncordon <NODE_NAME>

Escalation

  • If memory usage remains above threshold for more than 15 minutes, notify the platform team
  • If pods are repeatedly OOM-killed, escalate to the application owner
  • If production services are impacted, page the on-call engineer
  • HighCPUUsage
  • NodeDown
  • NodeRebootedRecently
  • Grafana → Node Overview
  • Grafana → Memory Usage Dashboard
runbooks/coustom_alerts/highmemoryusage.txt · Last modified: by admin