HighCPUUsage

Meaning

This alert is triggered when the average CPU usage on a node exceeds 85% for more than 5 minutes. CPU usage is calculated using node-exporter metrics by excluding idle CPU time.

Impact

Sustained high CPU usage can degrade node and application performance.

Possible impacts include:

Increased application latency
Pod CPU throttling
Slow scheduling and eviction decisions
Potential node instability under prolonged load

This alert is a warning but may become critical if CPU usage remains high.

Diagnosis

Identify nodes with high CPU usage:

kubectl top nodes

Identify top CPU-consuming pods:

kubectl top pods -A --sort-by=cpu

Describe the affected node for pressure conditions:

kubectl describe node <NODE_NAME>

Check recent events related to resource pressure:

kubectl get events --field-selector involvedObject.kind=Node

If SSH access is available, inspect CPU usage directly:

top
htop
mpstat -P ALL

Possible Causes

Traffic spike or increased workload
Application infinite loop or bug
Pods without CPU limits
Insufficient node CPU capacity
Background system processes consuming CPU

Mitigation

Identify and restart misbehaving pods if safe
Scale the workload horizontally if supported
Apply or adjust CPU limits and requests
Reschedule pods to other nodes if needed
Consider adding more nodes to the cluster

If necessary, temporarily drain the node:

kubectl drain <NODE_NAME> --ignore-daemonsets

Restore scheduling after mitigation:

kubectl uncordon <NODE_NAME>

Escalation

If CPU usage remains above threshold for more than 15 minutes, notify the platform team
If production workloads are impacted, page the on-call engineer
If multiple nodes are affected, treat as a capacity issue and escalate immediately

Related Alerts

NodeDown
NodeRebootedRecently
NodeNotReady

Related Dashboards

Grafana → Node Overview
Grafana → CPU Usage Dashboard

Table of Contents