Grafana Cloud

Manage availability

The Availability section on Kubernetes Overview answers one question: is your infrastructure currently able to serve user traffic? It flags things that exist on paper but aren’t actually available.

Availability checks identify workloads and nodes that are down or unable to serve traffic.

Availability panels on **Kubernetes Overview** home page
Availability panels on Kubernetes Overview home page

Click View detail on any tile to see the affected items listed under Detail view at the bottom of the page.

Zero replica deployments

These are deployments that are configured to run at least one replica but have zero available replicas running. The workload is fully down. This excludes deployments intentionally scaled to zero.

Failed rollouts, image pull errors, insufficient cluster resources, misconfigured probes.
Check rollout status, Pod events, and container logs. Roll back to a previous revision, fix the image reference, free up cluster resources, or correct probe settings.

Deployment rollout issues

These are deployments whose rollout has one of these conditions:

  • Not Progressing means the deployment controller has not made progress within the deadline.
  • Replica Failure means at least one replica Pod could not be created or deleted.
Insufficient cluster resources (CPU or memory), image pull errors (wrong image name, tag, or expired credentials), failing readiness or liveness probes, resource quota limits exceeded, volume mount failures, or Pod security policy violations.
Inspect deployment events and Pod status. Scale up node resources, fix image references or registry credentials, adjust probe settings, increase resource quotas, correct volume configurations, or update security policies.

Nodes not ready

These are Nodes where the Ready condition is False or Unknown. A NotReady node prevents new Pods from being scheduled and may disrupt running workloads. The Status column distinguishes a confirmed NotReady state from a transient Unknown state (meaning the node is unreachable).

kubelet crash or failure to report status, Node running out of memory, disk, or PIDs, network connectivity loss between the Node and the control plane, underlying VM or hardware failure, expired Node certificates, kernel or OS-level crash.
Check kubelet logs and Node events. Restart the kubelet, free up Node resources, restore network connectivity, renew certificates, or replace the failed Node.

Pods not ready

These are Pods in the Running phase that are failing their readiness probe. They are excluded from Service endpoints and are not receiving traffic.

Misconfigured readiness probes, application startup delays, or missing dependencies.
Review the readiness probe configuration and adjust thresholds or timeouts. Check that dependent services are available and that the application starts within the expected window.