runbooks:coustom_alerts:HostUnusualDiskReadRate

====== HostUnusualDiskReadRate ======

===== Meaning =====
This alert is triggered when a host node experiences **high disk read activity**, with IO wait greater than 80% over a 5-minute window.
It indicates that the disk may be a bottleneck or under heavy load.

===== Impact =====
High disk read rates can lead to:
  * Application slowdowns or latency
  * Increased pod response times
  * Potential cascading failures if services rely on disk-intensive operations
  * Node-level resource contention

This alert is **warning**, as prolonged high IO can degrade performance or trigger other alerts.

===== Diagnosis =====
Check disk IO statistics:

<code bash>
iostat -x 1 5
iotop -o
</code>

Check system-wide IO wait:

<code bash>
top
vmstat 1 5
</code>

Check disk usage and filesystem health:

<code bash>
df -h
lsblk
smartctl -a /dev/sdX
</code>

Check pods consuming disk on the node:

<code bash>
kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}
</code>

===== Possible Causes =====
  * Disk-intensive workloads or batch jobs
  * Logging or database writes causing high IO
  * Slow or failing disks
  * Misconfigured storage (e.g., small volumes)
  * Backup jobs or heavy monitoring metrics writes

===== Mitigation =====
  - Identify and reduce disk-intensive workloads
  - Move high IO workloads to other nodes or storage
  - Monitor disk health and replace failing disks
  - Tune filesystem or storage configuration if needed
  - Scale out storage for critical workloads

===== Escalation =====
  * Escalate if high IO persists for extended periods
  * Page on-call engineer if production services are impacted
  * Investigate related alerts (DiskPressure, HighDiskUsage)

===== Related Alerts =====
  * HighDiskUsage
  * HighDiskIOWait
  * KubernetesNodeDiskPressure
  * HostUnusualDiskWriteRate

===== Related Dashboards =====
  * Grafana → Node Disk IO
  * Grafana → Node Exporter Disk Metrics