HighMemoryUsage

Meaning

This alert is triggered when memory usage on a node exceeds 90% for more than 5 minutes. Memory usage is calculated based on total memory and available memory reported by node-exporter.

Impact

High memory usage can significantly affect node and application stability.

Possible impacts include:

Pod evictions due to memory pressure
Application crashes (OOMKilled)
Increased latency and degraded performance
Node becoming unresponsive under sustained pressure

This alert is a warning, but may escalate to a critical issue if not addressed.

Diagnosis

Check memory usage across nodes:

kubectl top nodes

Identify top memory-consuming pods:

kubectl top pods -A --sort-by=memory

Check node conditions for memory pressure:

kubectl describe node <NODE_NAME>

Look for recent memory-related events:

kubectl get events --field-selector involvedObject.kind=Node

If SSH access is available, inspect memory usage directly:

free -h
top
vmstat 1

Check for pods being OOM-killed:

kubectl get pods -A | grep OOMKilled

Possible Causes

Memory leak in an application
Pods without memory limits
Sudden increase in workload
Insufficient node memory capacity
Cache growth not properly controlled

Mitigation

Identify and restart leaking or misbehaving pods if safe
Set or adjust memory requests and limits for workloads
Scale the application or add more nodes if required
Evict non-critical workloads if needed
Investigate and fix memory leaks in application code

If the node is under sustained pressure, drain it temporarily:

kubectl drain <NODE_NAME> --ignore-daemonsets

After recovery:

kubectl uncordon <NODE_NAME>

Escalation

If memory usage remains above threshold for more than 15 minutes, notify the platform team
If pods are repeatedly OOM-killed, escalate to the application owner
If production services are impacted, page the on-call engineer

Related Alerts

HighCPUUsage
NodeDown
NodeRebootedRecently

Related Dashboards

Grafana → Node Overview
Grafana → Memory Usage Dashboard

Table of Contents