Table of Contents
runbooks:coustom_alerts:HighDiskUsage
HighDiskUsage
Meaning
This alert is triggered when disk usage on a node exceeds 90% for more than 5 minutes. Disk usage is calculated using filesystem size and free space metrics reported by node-exporter.
The alert is scoped to a specific mount point.
Impact
High disk usage can cause serious stability and availability issues.
Possible impacts include:
- Applications failing to write data or logs
- Pods crashing or entering error states
- Node instability and kubelet failures
- Potential data loss if disk becomes full
This alert is a warning, but may escalate to a critical issue if disk usage continues to grow.
Diagnosis
Identify affected nodes and mount points:
kubectl get nodes
If SSH access is available, check disk usage on the node:
df -h
Check disk usage per directory to find large consumers:
du -xh / | sort -h | tail -20
Check for pod-level disk usage (Kubernetes):
kubectl describe node <NODE_NAME>
Check recent events related to disk pressure:
kubectl get events --field-selector involvedObject.kind=Node
Check if the node is under DiskPressure:
kubectl get nodes
Possible Causes
- Log files growing uncontrollably
- Application writing excessive data to disk
- Container images and layers not cleaned up
- Old files or backups consuming disk space
- Insufficient disk capacity on the node
Mitigation
- Remove or rotate large log files
- Clean up unused container images and volumes
- Delete temporary or obsolete files
- Resize the disk if supported by the platform
- Move data to external storage if applicable
If the node is under DiskPressure, consider draining it:
kubectl drain <NODE_NAME> --ignore-daemonsets
After resolving the issue:
kubectl uncordon <NODE_NAME>
Escalation
- If disk usage continues to increase or exceeds 95%, escalate immediately
- If production workloads are impacted, page the on-call engineer
- If disk growth cause is unclear, escalate to the application owner or infrastructure team
Related Alerts
- HighMemoryUsage
- NodeDown
- NodeNotReady
Related Dashboards
- Grafana → Node Overview
- Grafana → Disk Usage Dashboard
