Table of Contents

runbooks:coustom_alerts:HighDiskUsage

HighDiskUsage

Meaning

This alert is triggered when disk usage on a node exceeds 90% for more than 5 minutes. Disk usage is calculated using filesystem size and free space metrics reported by node-exporter.

The alert is scoped to a specific mount point.

Impact

High disk usage can cause serious stability and availability issues.

Possible impacts include:

This alert is a warning, but may escalate to a critical issue if disk usage continues to grow.

Diagnosis

Identify affected nodes and mount points:

kubectl get nodes

If SSH access is available, check disk usage on the node:

df -h

Check disk usage per directory to find large consumers:

du -xh / | sort -h | tail -20

Check for pod-level disk usage (Kubernetes):

kubectl describe node <NODE_NAME>

Check recent events related to disk pressure:

kubectl get events --field-selector involvedObject.kind=Node

Check if the node is under DiskPressure:

kubectl get nodes

Possible Causes

Mitigation

  1. Remove or rotate large log files
  2. Clean up unused container images and volumes
  3. Delete temporary or obsolete files
  4. Resize the disk if supported by the platform
  5. Move data to external storage if applicable

If the node is under DiskPressure, consider draining it:

kubectl drain <NODE_NAME> --ignore-daemonsets

After resolving the issue:

kubectl uncordon <NODE_NAME>

Escalation