runbooks:coustom_alerts:HostOutOfDiskSpace ====== HostOutOfDiskSpace ====== ===== Meaning ===== This alert is triggered when a host node’s disk has **less than 10% free space** on any filesystem (excluding tmpfs, fuse, cifs, nfs) for more than 2 minutes. It indicates that the host is running low on disk space, which may cause system or application failures. ===== Impact ===== Low disk space can cause: * Pod evictions due to inability to write logs or data * Application failures * Node instability or crashes * Kubernetes scheduling failures for pods with persistent volume requirements * Increased latency or I/O errors This alert is **critical**, as disk space exhaustion can immediately impact production workloads. ===== Diagnosis ===== Check disk usage: df -h df -i lsblk Check disk space per mountpoint: du -sh /var/lib/kubelet/* du -sh /home/* Check pods consuming disk: kubectl get pvc --all-namespaces kubectl describe pod -n Check node events: kubectl get events --sort-by=.lastTimestamp ===== Possible Causes ===== * Large log files or temporary files * Full persistent volumes * Backup jobs or batch jobs filling disks * Container images not cleaned up * Disk size too small for workload ===== Mitigation ===== - Clean up unused files, logs, or images - Rotate and compress logs - Move non-critical data to another storage - Evict or reschedule non-critical pods - Increase disk capacity if possible Drain node if needed: kubectl drain --ignore-daemonsets --delete-emptydir-data kubectl uncordon ===== Escalation ===== * Escalate if disk space remains below 10% for extended periods * Page on-call engineer if production services are impacted * Treat multiple nodes with low disk space as a cluster-level incident ===== Related Alerts ===== * HighDiskUsage * HighDiskIOWait * KubernetesNodeDiskPressure * HostUnusualDiskReadRate ===== Related Dashboards ===== * Grafana → Node Disk Usage * Grafana → Node Exporter Disk Metrics