runbooks:coustom_alerts:highcpuusage
Table of Contents
runbooks:coustom_alerts:HighCPUUsage
HighCPUUsage
Meaning
This alert is triggered when the average CPU usage on a node exceeds 85% for more than 5 minutes. CPU usage is calculated using node-exporter metrics by excluding idle CPU time.
Impact
Sustained high CPU usage can degrade node and application performance.
Possible impacts include:
- Increased application latency
- Pod CPU throttling
- Slow scheduling and eviction decisions
- Potential node instability under prolonged load
This alert is a warning but may become critical if CPU usage remains high.
Diagnosis
Identify nodes with high CPU usage:
kubectl top nodes
Identify top CPU-consuming pods:
kubectl top pods -A --sort-by=cpu
Describe the affected node for pressure conditions:
kubectl describe node <NODE_NAME>
Check recent events related to resource pressure:
kubectl get events --field-selector involvedObject.kind=Node
If SSH access is available, inspect CPU usage directly:
top htop mpstat -P ALL
Possible Causes
- Traffic spike or increased workload
- Application infinite loop or bug
- Pods without CPU limits
- Insufficient node CPU capacity
- Background system processes consuming CPU
Mitigation
- Identify and restart misbehaving pods if safe
- Scale the workload horizontally if supported
- Apply or adjust CPU limits and requests
- Reschedule pods to other nodes if needed
- Consider adding more nodes to the cluster
If necessary, temporarily drain the node:
kubectl drain <NODE_NAME> --ignore-daemonsets
Restore scheduling after mitigation:
kubectl uncordon <NODE_NAME>
Escalation
- If CPU usage remains above threshold for more than 15 minutes, notify the platform team
- If production workloads are impacted, page the on-call engineer
- If multiple nodes are affected, treat as a capacity issue and escalate immediately
Related Alerts
- NodeDown
- NodeRebootedRecently
- NodeNotReady
Related Dashboards
- Grafana → Node Overview
- Grafana → CPU Usage Dashboard
runbooks/coustom_alerts/highcpuusage.txt · Last modified: by admin
