runbooks:coustom_alerts:HighCPUUsage
This alert is triggered when the average CPU usage on a node exceeds 85% for more than 5 minutes. CPU usage is calculated using node-exporter metrics by excluding idle CPU time.
Sustained high CPU usage can degrade node and application performance.
Possible impacts include:
This alert is a warning but may become critical if CPU usage remains high.
Identify nodes with high CPU usage:
kubectl top nodes
Identify top CPU-consuming pods:
kubectl top pods -A --sort-by=cpu
Describe the affected node for pressure conditions:
kubectl describe node <NODE_NAME>
Check recent events related to resource pressure:
kubectl get events --field-selector involvedObject.kind=Node
If SSH access is available, inspect CPU usage directly:
top htop mpstat -P ALL
If necessary, temporarily drain the node:
kubectl drain <NODE_NAME> --ignore-daemonsets
Restore scheduling after mitigation:
kubectl uncordon <NODE_NAME>