runbooks:coustom_alerts:KubeAPIDown
This alert is triggered when Prometheus is unable to scrape the Kubernetes API server metrics. It usually indicates that the API server is unreachable, unresponsive, or not running.
This alert represents a critical control-plane failure.
Possible impacts include:
If this alert is firing, the cluster is likely partially or completely unusable.
Check if the Kubernetes API server is reachable:
kubectl get nodes
If `kubectl` is unresponsive, check API server health endpoints (if accessible):
curl -k https://<API_SERVER_ENDPOINT>/healthz
Check control-plane pod status (for self-managed clusters):
kubectl get pods -n kube-system | grep kube-apiserver
Describe the API server pod for recent failures:
kubectl describe pod kube-apiserver-<node> -n kube-system
Check recent cluster-wide events:
kubectl get events -A --sort-by=.lastTimestamp
If running on managed Kubernetes, check cloud provider control-plane status dashboards.
If the issue is transient, continue monitoring after recovery.