User Tools

Site Tools


runbooks:coustom_alerts:highdiskusage

runbooks:coustom_alerts:HighDiskUsage

HighDiskUsage

Meaning

This alert is triggered when disk usage on a node exceeds 90% for more than 5 minutes. Disk usage is calculated using filesystem size and free space metrics reported by node-exporter.

The alert is scoped to a specific mount point.

Impact

High disk usage can cause serious stability and availability issues.

Possible impacts include:

  • Applications failing to write data or logs
  • Pods crashing or entering error states
  • Node instability and kubelet failures
  • Potential data loss if disk becomes full

This alert is a warning, but may escalate to a critical issue if disk usage continues to grow.

Diagnosis

Identify affected nodes and mount points:

kubectl get nodes

If SSH access is available, check disk usage on the node:

df -h

Check disk usage per directory to find large consumers:

du -xh / | sort -h | tail -20

Check for pod-level disk usage (Kubernetes):

kubectl describe node <NODE_NAME>

Check recent events related to disk pressure:

kubectl get events --field-selector involvedObject.kind=Node

Check if the node is under DiskPressure:

kubectl get nodes

Possible Causes

  • Log files growing uncontrollably
  • Application writing excessive data to disk
  • Container images and layers not cleaned up
  • Old files or backups consuming disk space
  • Insufficient disk capacity on the node

Mitigation

  1. Remove or rotate large log files
  2. Clean up unused container images and volumes
  3. Delete temporary or obsolete files
  4. Resize the disk if supported by the platform
  5. Move data to external storage if applicable

If the node is under DiskPressure, consider draining it:

kubectl drain <NODE_NAME> --ignore-daemonsets

After resolving the issue:

kubectl uncordon <NODE_NAME>

Escalation

  • If disk usage continues to increase or exceeds 95%, escalate immediately
  • If production workloads are impacted, page the on-call engineer
  • If disk growth cause is unclear, escalate to the application owner or infrastructure team
  • HighMemoryUsage
  • NodeDown
  • NodeNotReady
  • Grafana → Node Overview
  • Grafana → Disk Usage Dashboard
runbooks/coustom_alerts/highdiskusage.txt · Last modified: by admin