runbooks:coustom_alerts:HighDiskIOWait ====== HighDiskIOWait ====== ===== Meaning ===== This alert is triggered when the CPU spends an unusually high amount of time waiting for disk I/O operations to complete. High I/O wait typically indicates disk performance bottlenecks. ===== Impact ===== Sustained high disk I/O wait can significantly degrade system and application performance. Possible impacts include: * Increased application latency * Slow database queries and file operations * Pod startup delays * Reduced overall node throughput This alert is a **warning**, but may escalate if the condition persists. ===== Diagnosis ===== Check I/O wait and overall CPU usage: kubectl top nodes If SSH access is available, inspect disk I/O metrics directly: iostat -xz 1 vmstat 1 Identify processes causing high disk I/O: iotop Check disk usage and pressure conditions: kubectl describe node Verify if disk-related alerts are firing: kubectl get events --field-selector involvedObject.kind=Node ===== Possible Causes ===== * Disk saturation due to heavy read/write operations * Slow or degraded storage (network-attached or cloud disks) * Log flooding or excessive file writes * Database or batch jobs performing intensive I/O * Disk nearing full capacity ===== Mitigation ===== - Identify and throttle or stop I/O-heavy workloads - Move high I/O workloads to faster storage - Enable or tune log rotation - Scale out workloads to reduce per-node I/O pressure - Increase disk performance (IOPS / throughput) if supported If the node is severely impacted, drain it temporarily: kubectl drain --ignore-daemonsets After mitigation: kubectl uncordon ===== Escalation ===== * If high I/O wait persists beyond 10 minutes, escalate to the platform team * If multiple nodes are affected, treat as a storage-level incident * If production services are impacted, page the on-call engineer ===== Related Alerts ===== * HighDiskUsage * NodeNotReady * HighCPUUsage ===== Related Dashboards ===== * Grafana → Node Exporter / Disk I/O * Grafana → Storage Performance Overview