runbooks:coustom_alerts:hostoutofmemory
Table of Contents
runbooks:coustom_alerts:HostOutOfMemory
HostOutOfMemory
Meaning
This alert is triggered when a host node has less than 10% of available memory for more than 2 minutes. It indicates that the node is at risk of running out of memory, which may lead to OOMKilled processes and system instability.
Impact
Low memory on a host node can cause:
- Application pods being OOMKilled
- System processes failing
- Node instability or crashes
- Degraded application performance
- Kubernetes scheduling failures due to resource constraints
This alert is marked warning, as it can escalate quickly if memory continues to deplete.
Diagnosis
Check node memory usage:
kubectl top node {{ $labels.instance }} free -m
Check top memory-consuming processes:
top htop ps aux --sort=-%mem | head -n 20
Check pod resource usage on the node:
kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}
Possible Causes
- Memory leaks in applications
- Memory-intensive batch jobs
- Too many pods scheduled on the node
- Misconfigured pod resource requests/limits
- System processes consuming excessive memory
Mitigation
- Identify and restart memory-heavy pods or processes
- Scale workloads to other nodes
- Adjust resource requests/limits for pods
- Free up system memory (e.g., clear caches, restart unnecessary processes)
- Add more memory to the node if possible
Escalation
- Escalate if memory usage remains below 10% for an extended period
- Page on-call engineer if production services are affected
- Monitor related nodes for similar memory pressure
Related Alerts
- HighMemoryUsage
- KubernetesNodeMemoryPressure
- PodOOMKilled
- HostCPUHigh
Related Dashboards
- Grafana → Node Memory Usage
- Grafana → Node Resource Overview
runbooks/coustom_alerts/hostoutofmemory.txt · Last modified: by admin
