runbooks:coustom_alerts:HostOutOfMemory
====== HostOutOfMemory ======
===== Meaning =====
This alert is triggered when a host node has **less than 10% of available memory** for more than 2 minutes.
It indicates that the node is at risk of running out of memory, which may lead to OOMKilled processes and system instability.
===== Impact =====
Low memory on a host node can cause:
* Application pods being OOMKilled
* System processes failing
* Node instability or crashes
* Degraded application performance
* Kubernetes scheduling failures due to resource constraints
This alert is marked **warning**, as it can escalate quickly if memory continues to deplete.
===== Diagnosis =====
Check node memory usage:
kubectl top node {{ $labels.instance }}
free -m
Check top memory-consuming processes:
top
htop
ps aux --sort=-%mem | head -n 20
Check pod resource usage on the node:
kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}
===== Possible Causes =====
* Memory leaks in applications
* Memory-intensive batch jobs
* Too many pods scheduled on the node
* Misconfigured pod resource requests/limits
* System processes consuming excessive memory
===== Mitigation =====
- Identify and restart memory-heavy pods or processes
- Scale workloads to other nodes
- Adjust resource requests/limits for pods
- Free up system memory (e.g., clear caches, restart unnecessary processes)
- Add more memory to the node if possible
===== Escalation =====
* Escalate if memory usage remains below 10% for an extended period
* Page on-call engineer if production services are affected
* Monitor related nodes for similar memory pressure
===== Related Alerts =====
* HighMemoryUsage
* KubernetesNodeMemoryPressure
* PodOOMKilled
* HostCPUHigh
===== Related Dashboards =====
* Grafana → Node Memory Usage
* Grafana → Node Resource Overview