Table of Contents

runbooks:coustom_alerts:HighCPUUsage

HighCPUUsage

Meaning

This alert is triggered when the average CPU usage on a node exceeds 85% for more than 5 minutes. CPU usage is calculated using node-exporter metrics by excluding idle CPU time.

Impact

Sustained high CPU usage can degrade node and application performance.

Possible impacts include:

This alert is a warning but may become critical if CPU usage remains high.

Diagnosis

Identify nodes with high CPU usage:

kubectl top nodes

Identify top CPU-consuming pods:

kubectl top pods -A --sort-by=cpu

Describe the affected node for pressure conditions:

kubectl describe node <NODE_NAME>

Check recent events related to resource pressure:

kubectl get events --field-selector involvedObject.kind=Node

If SSH access is available, inspect CPU usage directly:

top
htop
mpstat -P ALL

Possible Causes

Mitigation

  1. Identify and restart misbehaving pods if safe
  2. Scale the workload horizontally if supported
  3. Apply or adjust CPU limits and requests
  4. Reschedule pods to other nodes if needed
  5. Consider adding more nodes to the cluster

If necessary, temporarily drain the node:

kubectl drain <NODE_NAME> --ignore-daemonsets

Restore scheduling after mitigation:

kubectl uncordon <NODE_NAME>

Escalation