runbooks:coustom_alerts:HostUnusualDiskReadRate
====== HostUnusualDiskReadRate ======
===== Meaning =====
This alert is triggered when a host node experiences **high disk read activity**, with IO wait greater than 80% over a 5-minute window.
It indicates that the disk may be a bottleneck or under heavy load.
===== Impact =====
High disk read rates can lead to:
* Application slowdowns or latency
* Increased pod response times
* Potential cascading failures if services rely on disk-intensive operations
* Node-level resource contention
This alert is **warning**, as prolonged high IO can degrade performance or trigger other alerts.
===== Diagnosis =====
Check disk IO statistics:
iostat -x 1 5
iotop -o
Check system-wide IO wait:
top
vmstat 1 5
Check disk usage and filesystem health:
df -h
lsblk
smartctl -a /dev/sdX
Check pods consuming disk on the node:
kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}
===== Possible Causes =====
* Disk-intensive workloads or batch jobs
* Logging or database writes causing high IO
* Slow or failing disks
* Misconfigured storage (e.g., small volumes)
* Backup jobs or heavy monitoring metrics writes
===== Mitigation =====
- Identify and reduce disk-intensive workloads
- Move high IO workloads to other nodes or storage
- Monitor disk health and replace failing disks
- Tune filesystem or storage configuration if needed
- Scale out storage for critical workloads
===== Escalation =====
* Escalate if high IO persists for extended periods
* Page on-call engineer if production services are impacted
* Investigate related alerts (DiskPressure, HighDiskUsage)
===== Related Alerts =====
* HighDiskUsage
* HighDiskIOWait
* KubernetesNodeDiskPressure
* HostUnusualDiskWriteRate
===== Related Dashboards =====
* Grafana → Node Disk IO
* Grafana → Node Exporter Disk Metrics