HighDiskIOWait

Meaning

This alert is triggered when the CPU spends an unusually high amount of time waiting for disk I/O operations to complete. High I/O wait typically indicates disk performance bottlenecks.

Impact

Sustained high disk I/O wait can significantly degrade system and application performance.

Possible impacts include:

Increased application latency
Slow database queries and file operations
Pod startup delays
Reduced overall node throughput

This alert is a warning, but may escalate if the condition persists.

Diagnosis

Check I/O wait and overall CPU usage:

kubectl top nodes

If SSH access is available, inspect disk I/O metrics directly:

iostat -xz 1
vmstat 1

Identify processes causing high disk I/O:

iotop

Check disk usage and pressure conditions:

kubectl describe node <NODE_NAME>

Verify if disk-related alerts are firing:

kubectl get events --field-selector involvedObject.kind=Node

Possible Causes

Disk saturation due to heavy read/write operations
Slow or degraded storage (network-attached or cloud disks)
Log flooding or excessive file writes
Database or batch jobs performing intensive I/O
Disk nearing full capacity

Mitigation

Identify and throttle or stop I/O-heavy workloads
Move high I/O workloads to faster storage
Enable or tune log rotation
Scale out workloads to reduce per-node I/O pressure
Increase disk performance (IOPS / throughput) if supported

If the node is severely impacted, drain it temporarily:

kubectl drain <NODE_NAME> --ignore-daemonsets

After mitigation:

kubectl uncordon <NODE_NAME>

Escalation

If high I/O wait persists beyond 10 minutes, escalate to the platform team
If multiple nodes are affected, treat as a storage-level incident
If production services are impacted, page the on-call engineer

Related Alerts

HighDiskUsage
NodeNotReady
HighCPUUsage

Related Dashboards

Grafana → Node Exporter / Disk I/O
Grafana → Storage Performance Overview

Table of Contents