HighDiskUsage

Meaning

This alert is triggered when disk usage on a node exceeds 90% for more than 5 minutes. Disk usage is calculated using filesystem size and free space metrics reported by node-exporter.

The alert is scoped to a specific mount point.

Impact

High disk usage can cause serious stability and availability issues.

Possible impacts include:

Applications failing to write data or logs
Pods crashing or entering error states
Node instability and kubelet failures
Potential data loss if disk becomes full

This alert is a warning, but may escalate to a critical issue if disk usage continues to grow.

Diagnosis

Identify affected nodes and mount points:

kubectl get nodes

If SSH access is available, check disk usage on the node:

df -h

Check disk usage per directory to find large consumers:

du -xh / | sort -h | tail -20

Check for pod-level disk usage (Kubernetes):

kubectl describe node <NODE_NAME>

Check recent events related to disk pressure:

kubectl get events --field-selector involvedObject.kind=Node

Check if the node is under DiskPressure:

kubectl get nodes

Possible Causes

Log files growing uncontrollably
Application writing excessive data to disk
Container images and layers not cleaned up
Old files or backups consuming disk space
Insufficient disk capacity on the node

Mitigation

Remove or rotate large log files
Clean up unused container images and volumes
Delete temporary or obsolete files
Resize the disk if supported by the platform
Move data to external storage if applicable

If the node is under DiskPressure, consider draining it:

kubectl drain <NODE_NAME> --ignore-daemonsets

After resolving the issue:

kubectl uncordon <NODE_NAME>

Escalation

If disk usage continues to increase or exceeds 95%, escalate immediately
If production workloads are impacted, page the on-call engineer
If disk growth cause is unclear, escalate to the application owner or infrastructure team

Related Alerts

HighMemoryUsage
NodeDown
NodeNotReady

Related Dashboards

Grafana → Node Overview
Grafana → Disk Usage Dashboard

Table of Contents