runbooks:coustom_alerts:kubernetesnodediskpressure
Table of Contents
runbooks:coustom_alerts:KubernetesNodeDiskPressure
KubernetesNodeDiskPressure
Meaning
This alert is triggered when a Kubernetes node reports the DiskPressure condition for more than 2 minutes. DiskPressure indicates that the node is running low on available disk space, and Kubernetes may evict pods to free space.
Impact
Disk pressure on a node can cause:
- Pod evictions or restarts
- Application failures due to insufficient storage
- Node instability
- Scheduling failures for new pods
This alert is critical, as sustained disk pressure can affect cluster stability and production workloads.
Diagnosis
Check node status:
kubectl get nodes kubectl describe node <NODE_NAME>
Check disk usage:
df -h du -sh /var/lib/kubelet/*
Check pods consuming disk space:
kubectl get pvc --all-namespaces kubectl describe pod <POD_NAME> -n <NAMESPACE>
Check recent events:
kubectl get events --sort-by=.lastTimestamp
Possible Causes
- Full disks due to logs, images, or temporary files
- Large persistent volumes filling up
- Containers writing excessive data
- Old or unused Docker images not cleaned
- Disk size too small for workload requirements
Mitigation
- Clean up unused images and temporary files
- Rotate and compress logs
- Move non-critical data to other storage
- Increase node disk capacity if possible
- Evict non-critical pods or scale workloads to other nodes
Drain node if immediate relief is needed:
kubectl drain <NODE_NAME> --ignore-daemonsets --delete-emptydir-data
After mitigation:
kubectl uncordon <NODE_NAME>
Escalation
- Escalate if DiskPressure persists beyond 10 minutes
- Page on-call engineer if production workloads are impacted
- Treat multiple affected nodes as cluster-level incident
Related Alerts
- HighDiskUsage
- HighDiskIOWait
- KubernetesNodeNotReady
- PodCrashLoopBackOff
Related Dashboards
- Grafana → Kubernetes / Node Disk
- Grafana → Node Exporter Disk Overview
runbooks/coustom_alerts/kubernetesnodediskpressure.txt · Last modified: by admin
