runbooks:coustom_alerts:hostoutofdiskspace
Table of Contents
runbooks:coustom_alerts:HostOutOfDiskSpace
HostOutOfDiskSpace
Meaning
This alert is triggered when a host node’s disk has less than 10% free space on any filesystem (excluding tmpfs, fuse, cifs, nfs) for more than 2 minutes. It indicates that the host is running low on disk space, which may cause system or application failures.
Impact
Low disk space can cause:
- Pod evictions due to inability to write logs or data
- Application failures
- Node instability or crashes
- Kubernetes scheduling failures for pods with persistent volume requirements
- Increased latency or I/O errors
This alert is critical, as disk space exhaustion can immediately impact production workloads.
Diagnosis
Check disk usage:
df -h df -i lsblk
Check disk space per mountpoint:
du -sh /var/lib/kubelet/* du -sh /home/*
Check pods consuming disk:
kubectl get pvc --all-namespaces kubectl describe pod <POD_NAME> -n <NAMESPACE>
Check node events:
kubectl get events --sort-by=.lastTimestamp
Possible Causes
- Large log files or temporary files
- Full persistent volumes
- Backup jobs or batch jobs filling disks
- Container images not cleaned up
- Disk size too small for workload
Mitigation
- Clean up unused files, logs, or images
- Rotate and compress logs
- Move non-critical data to another storage
- Evict or reschedule non-critical pods
- Increase disk capacity if possible
Drain node if needed:
kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data kubectl uncordon <NODE_NAME>
Escalation
- Escalate if disk space remains below 10% for extended periods
- Page on-call engineer if production services are impacted
- Treat multiple nodes with low disk space as a cluster-level incident
Related Alerts
- HighDiskUsage
- HighDiskIOWait
- KubernetesNodeDiskPressure
- HostUnusualDiskReadRate
Related Dashboards
- Grafana → Node Disk Usage
- Grafana → Node Exporter Disk Metrics
runbooks/coustom_alerts/hostoutofdiskspace.txt · Last modified: by admin
