runbooks:coustom_alerts:HostOutOfMemory
HostOutOfMemory
Meaning
This alert is triggered when a host node has less than 10% of available memory for more than 2 minutes.
It indicates that the node is at risk of running out of memory, which may lead to OOMKilled processes and system instability.
Impact
Low memory on a host node can cause:
Application pods being OOMKilled
System processes failing
Node instability or crashes
Degraded application performance
Kubernetes scheduling failures due to resource constraints
This alert is marked warning, as it can escalate quickly if memory continues to deplete.
Diagnosis
Check node memory usage:
kubectl top node {{ $labels.instance }}
free -m
Check top memory-consuming processes:
top
htop
ps aux --sort=-%mem | head -n 20
Check pod resource usage on the node:
kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}
Possible Causes
Memory leaks in applications
Memory-intensive batch jobs
Too many pods scheduled on the node
Misconfigured pod resource requests/limits
System processes consuming excessive memory
Mitigation
Identify and restart memory-heavy pods or processes
Scale workloads to other nodes
Adjust resource requests/limits for pods
Free up system memory (e.g., clear caches, restart unnecessary processes)
Add more memory to the node if possible
Escalation
Escalate if memory usage remains below 10% for an extended period
Page on-call engineer if production services are affected
Monitor related nodes for similar memory pressure