PodCrashLoopBackOff

Meaning

This alert is triggered when a pod container remains in the `CrashLoopBackOff` state for more than 6 hours. It indicates that the container is repeatedly crashing and Kubernetes is backing off from restarting it.

Impact

This alert represents a critical application-level failure.

Possible impacts include:

Application or service outage
Repeated pod restarts causing instability
Increased load on other replicas or services
Failed background jobs or controllers

If the affected pod is part of a critical service, production impact is likely.

Diagnosis

Check the status of the affected pod:

kubectl get pod {{ $labels.pod }} -n {{ $labels.namespace }}

Describe the pod to view events and failure reasons:

kubectl describe pod {{ $labels.pod }} -n {{ $labels.namespace }}

Check container logs for crash details:

kubectl logs {{ $labels.pod }} -n {{ $labels.namespace }} --previous

If multiple containers exist in the pod:

kubectl logs {{ $labels.pod }} -n {{ $labels.namespace }} -c <container_name> --previous

Check recent events in the namespace:

kubectl get events -n {{ $labels.namespace }} --sort-by=.lastTimestamp

Possible Causes

Application crash due to bug or misconfiguration
Missing or invalid environment variables
Dependency services unavailable
Insufficient memory or CPU causing OOMKills
Failing liveness or readiness probes
Incorrect container image or startup command

Mitigation

Review application logs to identify the crash reason
Fix configuration issues (env vars, secrets, config maps)
Increase resource limits if OOMKilled
Fix failing probes or startup commands
Redeploy the pod after applying fixes

Restart the pod if safe:

kubectl delete pod {{ $labels.pod }} -n {{ $labels.namespace }}

If the issue persists, scale down the workload temporarily:

kubectl scale deployment <deployment_name> -n {{ $labels.namespace }} --replicas=0

Escalation

Immediately notify the application owner
If production services are impacted, page the on-call engineer
If unresolved after 30 minutes, escalate to the platform team

Related Alerts

PodNotReady
HighMemoryUsage
NodeDown

Related Dashboards

Grafana → Kubernetes / Pods
Grafana → Container Resource Usage

Vforeseetech

Table of Contents