runbooks:coustom_alerts:NodeRebootedRecently
This alert is triggered when a node has rebooted within the last 5 minutes. It is detected by comparing the current time with the node's boot time as reported by node-exporter.
This alert indicates a recent node restart and may affect workloads running on the node.
Possible impacts include:
This alert is typically informational or warning-level, but may require attention if frequent or unexpected.
Verify node status and readiness:
kubectl get nodes
Check detailed node information and recent events:
kubectl describe node <NODE_NAME>
Check events related to node reboot or pressure conditions:
kubectl get events --field-selector involvedObject.kind=Node
Check system uptime from node-exporter metrics (Grafana) or via SSH:
uptime
If SSH access is available, check system logs for reboot cause:
journalctl --list-boots journalctl -b -1
If needed, temporarily cordon the node for investigation:
kubectl cordon <NODE_NAME>
Uncordon once verified healthy:
kubectl uncordon <NODE_NAME>