runbooks:coustom_alerts:HighDiskIOWait
====== HighDiskIOWait ======
===== Meaning =====
This alert is triggered when the CPU spends an unusually high amount of time waiting for disk I/O operations to complete.
High I/O wait typically indicates disk performance bottlenecks.
===== Impact =====
Sustained high disk I/O wait can significantly degrade system and application performance.
Possible impacts include:
* Increased application latency
* Slow database queries and file operations
* Pod startup delays
* Reduced overall node throughput
This alert is a **warning**, but may escalate if the condition persists.
===== Diagnosis =====
Check I/O wait and overall CPU usage:
kubectl top nodes
If SSH access is available, inspect disk I/O metrics directly:
iostat -xz 1
vmstat 1
Identify processes causing high disk I/O:
iotop
Check disk usage and pressure conditions:
kubectl describe node
Verify if disk-related alerts are firing:
kubectl get events --field-selector involvedObject.kind=Node
===== Possible Causes =====
* Disk saturation due to heavy read/write operations
* Slow or degraded storage (network-attached or cloud disks)
* Log flooding or excessive file writes
* Database or batch jobs performing intensive I/O
* Disk nearing full capacity
===== Mitigation =====
- Identify and throttle or stop I/O-heavy workloads
- Move high I/O workloads to faster storage
- Enable or tune log rotation
- Scale out workloads to reduce per-node I/O pressure
- Increase disk performance (IOPS / throughput) if supported
If the node is severely impacted, drain it temporarily:
kubectl drain --ignore-daemonsets
After mitigation:
kubectl uncordon
===== Escalation =====
* If high I/O wait persists beyond 10 minutes, escalate to the platform team
* If multiple nodes are affected, treat as a storage-level incident
* If production services are impacted, page the on-call engineer
===== Related Alerts =====
* HighDiskUsage
* NodeNotReady
* HighCPUUsage
===== Related Dashboards =====
* Grafana → Node Exporter / Disk I/O
* Grafana → Storage Performance Overview