User Tools

Site Tools


runbooks:coustom_alerts:hostoutofmemory

runbooks:coustom_alerts:HostOutOfMemory

HostOutOfMemory

Meaning

This alert is triggered when a host node has less than 10% of available memory for more than 2 minutes. It indicates that the node is at risk of running out of memory, which may lead to OOMKilled processes and system instability.

Impact

Low memory on a host node can cause:

  • Application pods being OOMKilled
  • System processes failing
  • Node instability or crashes
  • Degraded application performance
  • Kubernetes scheduling failures due to resource constraints

This alert is marked warning, as it can escalate quickly if memory continues to deplete.

Diagnosis

Check node memory usage:

kubectl top node {{ $labels.instance }}
free -m

Check top memory-consuming processes:

top
htop
ps aux --sort=-%mem | head -n 20

Check pod resource usage on the node:

kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}

Possible Causes

  • Memory leaks in applications
  • Memory-intensive batch jobs
  • Too many pods scheduled on the node
  • Misconfigured pod resource requests/limits
  • System processes consuming excessive memory

Mitigation

  1. Identify and restart memory-heavy pods or processes
  2. Scale workloads to other nodes
  3. Adjust resource requests/limits for pods
  4. Free up system memory (e.g., clear caches, restart unnecessary processes)
  5. Add more memory to the node if possible

Escalation

  • Escalate if memory usage remains below 10% for an extended period
  • Page on-call engineer if production services are affected
  • Monitor related nodes for similar memory pressure
  • HighMemoryUsage
  • KubernetesNodeMemoryPressure
  • PodOOMKilled
  • HostCPUHigh
  • Grafana → Node Memory Usage
  • Grafana → Node Resource Overview
runbooks/coustom_alerts/hostoutofmemory.txt · Last modified: by admin