runbooks:coustom_alerts:HostOutOfMemory ====== HostOutOfMemory ====== ===== Meaning ===== This alert is triggered when a host node has **less than 10% of available memory** for more than 2 minutes. It indicates that the node is at risk of running out of memory, which may lead to OOMKilled processes and system instability. ===== Impact ===== Low memory on a host node can cause: * Application pods being OOMKilled * System processes failing * Node instability or crashes * Degraded application performance * Kubernetes scheduling failures due to resource constraints This alert is marked **warning**, as it can escalate quickly if memory continues to deplete. ===== Diagnosis ===== Check node memory usage: kubectl top node {{ $labels.instance }} free -m Check top memory-consuming processes: top htop ps aux --sort=-%mem | head -n 20 Check pod resource usage on the node: kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }} ===== Possible Causes ===== * Memory leaks in applications * Memory-intensive batch jobs * Too many pods scheduled on the node * Misconfigured pod resource requests/limits * System processes consuming excessive memory ===== Mitigation ===== - Identify and restart memory-heavy pods or processes - Scale workloads to other nodes - Adjust resource requests/limits for pods - Free up system memory (e.g., clear caches, restart unnecessary processes) - Add more memory to the node if possible ===== Escalation ===== * Escalate if memory usage remains below 10% for an extended period * Page on-call engineer if production services are affected * Monitor related nodes for similar memory pressure ===== Related Alerts ===== * HighMemoryUsage * KubernetesNodeMemoryPressure * PodOOMKilled * HostCPUHigh ===== Related Dashboards ===== * Grafana → Node Memory Usage * Grafana → Node Resource Overview