Table of Contents

runbooks:coustom_alerts:HighDiskIOWait

HighDiskIOWait

Meaning

This alert is triggered when the CPU spends an unusually high amount of time waiting for disk I/O operations to complete. High I/O wait typically indicates disk performance bottlenecks.

Impact

Sustained high disk I/O wait can significantly degrade system and application performance.

Possible impacts include:

This alert is a warning, but may escalate if the condition persists.

Diagnosis

Check I/O wait and overall CPU usage:

kubectl top nodes

If SSH access is available, inspect disk I/O metrics directly:

iostat -xz 1
vmstat 1

Identify processes causing high disk I/O:

iotop

Check disk usage and pressure conditions:

kubectl describe node <NODE_NAME>

Verify if disk-related alerts are firing:

kubectl get events --field-selector involvedObject.kind=Node

Possible Causes

Mitigation

  1. Identify and throttle or stop I/O-heavy workloads
  2. Move high I/O workloads to faster storage
  3. Enable or tune log rotation
  4. Scale out workloads to reduce per-node I/O pressure
  5. Increase disk performance (IOPS / throughput) if supported

If the node is severely impacted, drain it temporarily:

kubectl drain <NODE_NAME> --ignore-daemonsets

After mitigation:

kubectl uncordon <NODE_NAME>

Escalation