runbooks:coustom_alerts:NodeRebootedRecently ====== NodeRebootedRecently ====== ===== Meaning ===== This alert is triggered when a node has rebooted within the last 5 minutes. It is detected by comparing the current time with the node's boot time as reported by node-exporter. ===== Impact ===== This alert indicates a **recent node restart** and may affect workloads running on the node. Possible impacts include: * Temporary disruption of pods scheduled on the node * Pod restarts or rescheduling to other nodes * Short-lived service degradation * Loss of in-memory application state This alert is typically **informational or warning-level**, but may require attention if frequent or unexpected. ===== Diagnosis ===== Verify node status and readiness: kubectl get nodes Check detailed node information and recent events: kubectl describe node Check events related to node reboot or pressure conditions: kubectl get events --field-selector involvedObject.kind=Node Check system uptime from node-exporter metrics (Grafana) or via SSH: uptime If SSH access is available, check system logs for reboot cause: journalctl --list-boots journalctl -b -1 ===== Possible Causes ===== * Planned maintenance or OS patching * Kernel panic or hardware issue * Cloud provider host restart * Manual reboot by an operator * Power or resource pressure issues ===== Mitigation ===== - Confirm whether the reboot was planned or expected - Ensure the node is in `Ready` state - Verify that all critical pods have been rescheduled successfully - Check workloads for crash loops or degraded performance - If reboots are frequent, investigate system and kernel logs If needed, temporarily cordon the node for investigation: kubectl cordon Uncordon once verified healthy: kubectl uncordon ===== Escalation ===== * If the reboot was unplanned, notify the platform or infrastructure team * If the same node reboots multiple times within 24 hours, escalate immediately * If production services are impacted, page the on-call engineer ===== Related Alerts ===== * NodeDown * NodeNotReady * KubeletDown ===== Related Dashboards ===== * Grafana → Node Overview * Grafana → Node Exporter