User Tools

Site Tools


runbooks:coustom_alerts:highcpuusage

runbooks:coustom_alerts:HighCPUUsage

HighCPUUsage

Meaning

This alert is triggered when the average CPU usage on a node exceeds 85% for more than 5 minutes. CPU usage is calculated using node-exporter metrics by excluding idle CPU time.

Impact

Sustained high CPU usage can degrade node and application performance.

Possible impacts include:

  • Increased application latency
  • Pod CPU throttling
  • Slow scheduling and eviction decisions
  • Potential node instability under prolonged load

This alert is a warning but may become critical if CPU usage remains high.

Diagnosis

Identify nodes with high CPU usage:

kubectl top nodes

Identify top CPU-consuming pods:

kubectl top pods -A --sort-by=cpu

Describe the affected node for pressure conditions:

kubectl describe node <NODE_NAME>

Check recent events related to resource pressure:

kubectl get events --field-selector involvedObject.kind=Node

If SSH access is available, inspect CPU usage directly:

top
htop
mpstat -P ALL

Possible Causes

  • Traffic spike or increased workload
  • Application infinite loop or bug
  • Pods without CPU limits
  • Insufficient node CPU capacity
  • Background system processes consuming CPU

Mitigation

  1. Identify and restart misbehaving pods if safe
  2. Scale the workload horizontally if supported
  3. Apply or adjust CPU limits and requests
  4. Reschedule pods to other nodes if needed
  5. Consider adding more nodes to the cluster

If necessary, temporarily drain the node:

kubectl drain <NODE_NAME> --ignore-daemonsets

Restore scheduling after mitigation:

kubectl uncordon <NODE_NAME>

Escalation

  • If CPU usage remains above threshold for more than 15 minutes, notify the platform team
  • If production workloads are impacted, page the on-call engineer
  • If multiple nodes are affected, treat as a capacity issue and escalate immediately
  • NodeDown
  • NodeRebootedRecently
  • NodeNotReady
  • Grafana → Node Overview
  • Grafana → CPU Usage Dashboard
runbooks/coustom_alerts/highcpuusage.txt · Last modified: by admin