Table of Contents

runbooks:coustom_alerts:NodeRebootedRecently

NodeRebootedRecently

Meaning

This alert is triggered when a node has rebooted within the last 5 minutes. It is detected by comparing the current time with the node's boot time as reported by node-exporter.

Impact

This alert indicates a recent node restart and may affect workloads running on the node.

Possible impacts include:

This alert is typically informational or warning-level, but may require attention if frequent or unexpected.

Diagnosis

Verify node status and readiness:

kubectl get nodes

Check detailed node information and recent events:

kubectl describe node <NODE_NAME>

Check events related to node reboot or pressure conditions:

kubectl get events --field-selector involvedObject.kind=Node

Check system uptime from node-exporter metrics (Grafana) or via SSH:

uptime

If SSH access is available, check system logs for reboot cause:

journalctl --list-boots
journalctl -b -1

Possible Causes

Mitigation

  1. Confirm whether the reboot was planned or expected
  2. Ensure the node is in `Ready` state
  3. Verify that all critical pods have been rescheduled successfully
  4. Check workloads for crash loops or degraded performance
  5. If reboots are frequent, investigate system and kernel logs

If needed, temporarily cordon the node for investigation:

kubectl cordon <NODE_NAME>

Uncordon once verified healthy:

kubectl uncordon <NODE_NAME>

Escalation