NodeDown

Meaning

This alert is triggered when Prometheus is unable to scrape metrics from the node-exporter running on a node. The condition persists when the `up` metric for node-exporter is `0`, indicating that the exporter is unreachable.

Impact

This alert indicates a critical node-level issue.

Possible impacts include:

Loss of node-level metrics (CPU, memory, disk, network)
Reduced observability for workloads running on the node
Node may be powered off, unreachable, or network-isolated
If the node is actually down, workloads may be rescheduled or unavailable

This alert does not always mean the node is completely down, but it does mean monitoring visibility is lost.

Diagnosis

Check whether the node is visible and ready in the cluster:

kubectl get nodes

Check detailed node status and recent conditions:

kubectl describe node <NODE_NAME>

Verify if node-exporter pod is running (Kubernetes setup):

kubectl get pods -n monitoring -o wide | grep node-exporter

Check events related to the node:

kubectl get events --field-selector involvedObject.kind=Node

If you have SSH access to the node, verify node-exporter and node health:

systemctl status node-exporter
uptime
df -h

Test connectivity from Prometheus to the node:

curl http://<NODE_IP>:9100/metrics

Possible Causes

Node is powered off or crashed
Network connectivity issue between Prometheus and the node
node-exporter service is stopped or crashed
Firewall or security group blocking port 9100
High resource pressure causing exporter to fail

Mitigation

If the node is down, restore or reboot the node
Restart node-exporter if the service is not running
Fix networking or firewall issues blocking metrics access
If the node is unhealthy, consider draining it:

kubectl drain <NODE_NAME> --ignore-daemonsets

Once resolved, uncordon the node:

kubectl uncordon <NODE_NAME>

Escalation

If multiple nodes are affected, escalate to the platform or infrastructure team immediately
If production workloads are impacted for more than 10 minutes, page the on-call engineer
If cloud provider issues are suspected, open a support ticket

Related Alerts

KubeletDown
NodeNotReady
DiskFull

Related Dashboards

Grafana → Node Exporter / Node Overview

Table of Contents