User Tools

Site Tools


runbooks:coustom_alerts:hostunusualdiskreadrate

runbooks:coustom_alerts:HostUnusualDiskReadRate

HostUnusualDiskReadRate

Meaning

This alert is triggered when a host node experiences high disk read activity, with IO wait greater than 80% over a 5-minute window. It indicates that the disk may be a bottleneck or under heavy load.

Impact

High disk read rates can lead to:

  • Application slowdowns or latency
  • Increased pod response times
  • Potential cascading failures if services rely on disk-intensive operations
  • Node-level resource contention

This alert is warning, as prolonged high IO can degrade performance or trigger other alerts.

Diagnosis

Check disk IO statistics:

iostat -x 1 5
iotop -o

Check system-wide IO wait:

top
vmstat 1 5

Check disk usage and filesystem health:

df -h
lsblk
smartctl -a /dev/sdX

Check pods consuming disk on the node:

kubectl top pod --all-namespaces --field-selector spec.nodeName={{ $labels.instance }}

Possible Causes

  • Disk-intensive workloads or batch jobs
  • Logging or database writes causing high IO
  • Slow or failing disks
  • Misconfigured storage (e.g., small volumes)
  • Backup jobs or heavy monitoring metrics writes

Mitigation

  1. Identify and reduce disk-intensive workloads
  2. Move high IO workloads to other nodes or storage
  3. Monitor disk health and replace failing disks
  4. Tune filesystem or storage configuration if needed
  5. Scale out storage for critical workloads

Escalation

  • Escalate if high IO persists for extended periods
  • Page on-call engineer if production services are impacted
  • Investigate related alerts (DiskPressure, HighDiskUsage)
  • HighDiskUsage
  • HighDiskIOWait
  • KubernetesNodeDiskPressure
  • HostUnusualDiskWriteRate
  • Grafana → Node Disk IO
  • Grafana → Node Exporter Disk Metrics
runbooks/coustom_alerts/hostunusualdiskreadrate.txt · Last modified: by admin