Grafana Cloud

Manage stability

The Stability section on Kubernetes Overview catches workloads that haven’t stopped serving traffic yet but are showing stress that will likely cause outages if you ignore them.

A stability check detects containers and Pods that are crashing, restarting, or stuck.

Stability panels on **Kubernetes Overview** home page
Stability panels on Kubernetes Overview home page

Click View detail on any tile to see the affected items listed under Detail view at the bottom of the page.

Restarting containers

Containers that have restarted more than twice in the last hour, sorted by highest restart count.

A high restart count typically signals a crash loop.

OOM kills, failed liveness probes, or application errors.
Inspect Pod logs and events to determine why the container is crashing. Adjust resource limits, correct the liveness probe configuration, or fix the underlying application error.

OOMKilled containers

These are containers with the most recent termination caused by the kernel OOMKilled event.

The container exceeded its memory limit, or the Node ran out of memory.
Increase memory limits or requests for the affected container. If the Node is under memory pressure, consider scaling up or redistributing workloads.

Pending Pods

Pods stuck in the Pending phase that cannot be scheduled onto a Node.

Insufficient cluster resources, unsatisfiable Node affinity or taints, missing PersistentVolumes, or image pull failures.
Scale up the cluster or free resources, adjust Node affinity rules or taints, provision the required PersistentVolumes, or fix image references.

Image pull errors

These are containers waiting because their image cannot be pulled. ImagePullBackOff means Kubernetes is retrying with exponential backoff. ErrImagePull is the initial failure.

Incorrect image names or tags, missing or expired registry credentials, or rate limiting by the registry.
Verify the image name and tag, update or create registry pull secrets, or wait for rate limits to reset and consider authenticating to increase limits.