Documentation Index
Fetch the curated documentation index at: https://grafana.com/llms.txt
Fetch the complete documentation index at: https://grafana.com/llms-full.txt
Use this file to discover all available pages before exploring further.
STOP! If you are an AI agent or LLM, read this before continuing. This is the HTML version of a Grafana documentation page. Always request the Markdown version instead - HTML wastes context. Get this page as Markdown: https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/triage-your-infrastructure/manage-stability.md (append .md) or send Accept: text/markdown to https://grafana.com/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/triage-your-infrastructure/manage-stability/. For the curated documentation index, use https://grafana.com/llms.txt. For the complete documentation index, use https://grafana.com/llms-full.txt.
Manage stability
The Stability section on Kubernetes Overview catches workloads that haven’t stopped serving traffic yet but are showing stress that will likely cause outages if you ignore them.
A stability check detects containers and Pods that are crashing, restarting, or stuck.

Click View detail on any tile to see the affected items listed under Detail view at the bottom of the page.
Restarting containers
Containers that have restarted more than twice in the last hour, sorted by highest restart count.
A high restart count typically signals a crash loop.
OOMKilled containers
These are containers with the most recent termination caused by the kernel OOMKilled event.
Pending Pods
Pods stuck in the Pending phase that cannot be scheduled onto a Node.
Image pull errors
These are containers waiting because their image cannot be pulled. ImagePullBackOff means Kubernetes is retrying with exponential backoff. ErrImagePull is the initial failure.
Was this page helpful?
Related resources from Grafana Labs


