runbooks:coustom_alerts:HostOutOfMemory

====== HostOutOfMemory ======

===== Meaning =====
This alert is triggered when a host node has **less than 10% of available memory** for more than 2 minutes.
It indicates that the node is at risk of running out of memory, which may lead to OOMKilled processes and system instability.

===== Impact =====
Low memory on a host node can cause:
  * Application pods being OOMKilled
  * System processes failing
  * Node instability or crashes
  * Degraded application performance
  * Kubernetes scheduling failures due to resource constraints

This alert is marked **warning**, as it can escalate quickly if memory continues to deplete.

===== Diagnosis =====
Check node memory usage:

<code bash>
kubectl top node {{ $labels.instance }}
free -m
</code>

Check top memory-consuming processes:

<code bash>
top
htop
ps aux --sort=-%mem | head -n 20
</code>

Check pod resource usage on the node:

<code bash>
kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}
</code>

===== Possible Causes =====
  * Memory leaks in applications
  * Memory-intensive batch jobs
  * Too many pods scheduled on the node
  * Misconfigured pod resource requests/limits
  * System processes consuming excessive memory

===== Mitigation =====
  - Identify and restart memory-heavy pods or processes
  - Scale workloads to other nodes
  - Adjust resource requests/limits for pods
  - Free up system memory (e.g., clear caches, restart unnecessary processes)
  - Add more memory to the node if possible

===== Escalation =====
  * Escalate if memory usage remains below 10% for an extended period
  * Page on-call engineer if production services are affected
  * Monitor related nodes for similar memory pressure

===== Related Alerts =====
  * HighMemoryUsage
  * KubernetesNodeMemoryPressure
  * PodOOMKilled
  * HostCPUHigh

===== Related Dashboards =====
  * Grafana → Node Memory Usage
  * Grafana → Node Resource Overview