runbooks:coustom_alerts:highmemoryusage
Table of Contents
runbooks:coustom_alerts:HighMemoryUsage
HighMemoryUsage
Meaning
This alert is triggered when memory usage on a node exceeds 90% for more than 5 minutes. Memory usage is calculated based on total memory and available memory reported by node-exporter.
Impact
High memory usage can significantly affect node and application stability.
Possible impacts include:
- Pod evictions due to memory pressure
- Application crashes (OOMKilled)
- Increased latency and degraded performance
- Node becoming unresponsive under sustained pressure
This alert is a warning, but may escalate to a critical issue if not addressed.
Diagnosis
Check memory usage across nodes:
kubectl top nodes
Identify top memory-consuming pods:
kubectl top pods -A --sort-by=memory
Check node conditions for memory pressure:
kubectl describe node <NODE_NAME>
Look for recent memory-related events:
kubectl get events --field-selector involvedObject.kind=Node
If SSH access is available, inspect memory usage directly:
free -h top vmstat 1
Check for pods being OOM-killed:
kubectl get pods -A | grep OOMKilled
Possible Causes
- Memory leak in an application
- Pods without memory limits
- Sudden increase in workload
- Insufficient node memory capacity
- Cache growth not properly controlled
Mitigation
- Identify and restart leaking or misbehaving pods if safe
- Set or adjust memory requests and limits for workloads
- Scale the application or add more nodes if required
- Evict non-critical workloads if needed
- Investigate and fix memory leaks in application code
If the node is under sustained pressure, drain it temporarily:
kubectl drain <NODE_NAME> --ignore-daemonsets
After recovery:
kubectl uncordon <NODE_NAME>
Escalation
- If memory usage remains above threshold for more than 15 minutes, notify the platform team
- If pods are repeatedly OOM-killed, escalate to the application owner
- If production services are impacted, page the on-call engineer
Related Alerts
- HighCPUUsage
- NodeDown
- NodeRebootedRecently
Related Dashboards
- Grafana → Node Overview
- Grafana → Memory Usage Dashboard
runbooks/coustom_alerts/highmemoryusage.txt · Last modified: by admin
