Documentation for automated readers
A curated documentation index is available at: https://grafana.com/llms.txt
A complete documentation index is available at: https://grafana.com/llms-full.txt
These indexes can help with page discovery before fetching individual documents.
This page is also available in Markdown, which may be easier for automated readers and AI tools to parse than HTML. The Markdown version is available at https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/triage-your-infrastructure/manage-stability.md, or by sending Accept: text/markdown to https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/triage-your-infrastructure/manage-stability/. For broader documentation discovery, the curated index is available at https://grafana.com/llms.txt and the complete index is available at https://grafana.com/llms-full.txt.
Manage stability
The Stability section on Kubernetes Overview catches workloads that haven’t stopped serving traffic yet but are showing stress that will likely cause outages if you ignore them.
A stability check detects containers and Pods that are crashing, restarting, or stuck.

Click View detail on any tile to see the affected items listed under Detail view at the bottom of the page.
Restarting containers
Containers that have restarted more than twice in the last hour, sorted by highest restart count.
A high restart count typically signals a crash loop.
OOMKilled containers
These are containers with the most recent termination caused by the kernel OOMKilled event.
Pending Pods
Pods stuck in the Pending phase that cannot be scheduled onto a Node.
Image pull errors
These are containers waiting because their image cannot be pulled. ImagePullBackOff means Kubernetes is retrying with exponential backoff. ErrImagePull is the initial failure.
Was this page helpful?
Related resources from Grafana Labs


