runbooks:coustom_alerts:HostUnusualDiskReadRate ====== HostUnusualDiskReadRate ====== ===== Meaning ===== This alert is triggered when a host node experiences **high disk read activity**, with IO wait greater than 80% over a 5-minute window. It indicates that the disk may be a bottleneck or under heavy load. ===== Impact ===== High disk read rates can lead to: * Application slowdowns or latency * Increased pod response times * Potential cascading failures if services rely on disk-intensive operations * Node-level resource contention This alert is **warning**, as prolonged high IO can degrade performance or trigger other alerts. ===== Diagnosis ===== Check disk IO statistics: iostat -x 1 5 iotop -o Check system-wide IO wait: top vmstat 1 5 Check disk usage and filesystem health: df -h lsblk smartctl -a /dev/sdX Check pods consuming disk on the node: kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }} ===== Possible Causes ===== * Disk-intensive workloads or batch jobs * Logging or database writes causing high IO * Slow or failing disks * Misconfigured storage (e.g., small volumes) * Backup jobs or heavy monitoring metrics writes ===== Mitigation ===== - Identify and reduce disk-intensive workloads - Move high IO workloads to other nodes or storage - Monitor disk health and replace failing disks - Tune filesystem or storage configuration if needed - Scale out storage for critical workloads ===== Escalation ===== * Escalate if high IO persists for extended periods * Page on-call engineer if production services are impacted * Investigate related alerts (DiskPressure, HighDiskUsage) ===== Related Alerts ===== * HighDiskUsage * HighDiskIOWait * KubernetesNodeDiskPressure * HostUnusualDiskWriteRate ===== Related Dashboards ===== * Grafana → Node Disk IO * Grafana → Node Exporter Disk Metrics