Table of Contents

runbooks:coustom_alerts:HighMemoryUsage

HighMemoryUsage

Meaning

This alert is triggered when memory usage on a node exceeds 90% for more than 5 minutes. Memory usage is calculated based on total memory and available memory reported by node-exporter.

Impact

High memory usage can significantly affect node and application stability.

Possible impacts include:

This alert is a warning, but may escalate to a critical issue if not addressed.

Diagnosis

Check memory usage across nodes:

kubectl top nodes

Identify top memory-consuming pods:

kubectl top pods -A --sort-by=memory

Check node conditions for memory pressure:

kubectl describe node <NODE_NAME>

Look for recent memory-related events:

kubectl get events --field-selector involvedObject.kind=Node

If SSH access is available, inspect memory usage directly:

free -h
top
vmstat 1

Check for pods being OOM-killed:

kubectl get pods -A | grep OOMKilled

Possible Causes

Mitigation

  1. Identify and restart leaking or misbehaving pods if safe
  2. Set or adjust memory requests and limits for workloads
  3. Scale the application or add more nodes if required
  4. Evict non-critical workloads if needed
  5. Investigate and fix memory leaks in application code

If the node is under sustained pressure, drain it temporarily:

kubectl drain <NODE_NAME> --ignore-daemonsets

After recovery:

kubectl uncordon <NODE_NAME>

Escalation