User Tools

Site Tools


runbooks:coustom_alerts:highdiskiowait

runbooks:coustom_alerts:HighDiskIOWait

HighDiskIOWait

Meaning

This alert is triggered when the CPU spends an unusually high amount of time waiting for disk I/O operations to complete. High I/O wait typically indicates disk performance bottlenecks.

Impact

Sustained high disk I/O wait can significantly degrade system and application performance.

Possible impacts include:

  • Increased application latency
  • Slow database queries and file operations
  • Pod startup delays
  • Reduced overall node throughput

This alert is a warning, but may escalate if the condition persists.

Diagnosis

Check I/O wait and overall CPU usage:

kubectl top nodes

If SSH access is available, inspect disk I/O metrics directly:

iostat -xz 1
vmstat 1

Identify processes causing high disk I/O:

iotop

Check disk usage and pressure conditions:

kubectl describe node <NODE_NAME>

Verify if disk-related alerts are firing:

kubectl get events --field-selector involvedObject.kind=Node

Possible Causes

  • Disk saturation due to heavy read/write operations
  • Slow or degraded storage (network-attached or cloud disks)
  • Log flooding or excessive file writes
  • Database or batch jobs performing intensive I/O
  • Disk nearing full capacity

Mitigation

  1. Identify and throttle or stop I/O-heavy workloads
  2. Move high I/O workloads to faster storage
  3. Enable or tune log rotation
  4. Scale out workloads to reduce per-node I/O pressure
  5. Increase disk performance (IOPS / throughput) if supported

If the node is severely impacted, drain it temporarily:

kubectl drain <NODE_NAME> --ignore-daemonsets

After mitigation:

kubectl uncordon <NODE_NAME>

Escalation

  • If high I/O wait persists beyond 10 minutes, escalate to the platform team
  • If multiple nodes are affected, treat as a storage-level incident
  • If production services are impacted, page the on-call engineer
  • HighDiskUsage
  • NodeNotReady
  • HighCPUUsage
  • Grafana → Node Exporter / Disk I/O
  • Grafana → Storage Performance Overview
runbooks/coustom_alerts/highdiskiowait.txt · Last modified: by admin