---
title: "Explore your infrastructure with Kubernetes Monitoring | Grafana Cloud documentation"
description: "How to explore your infrastructure with the features available in Grafana Kubernetes Monitoring"
---

# Explore your infrastructure with Kubernetes Monitoring

Kubernetes Monitoring offers visualization and analysis tools for you to:

- Evaluate the health, efficiency, and cost of Kubernetes infrastructure components.
- Analyze historical data as well as forecasts.
- View predictions created with machine learning.
- Manage alerts.

## Navigate to Kubernetes Monitoring

1. Navigate to your [Grafana Cloud portal](/docs/grafana-cloud/account-management/cloud-portal/).
2. In the menu, select the [stack](/docs/grafana-cloud/account-management/cloud-stacks/) you want to work with.
3. Click the Grafana logo icon.
4. In the main menu, expand **Infrastructure**.
   
   [Main menu with Infrastructure selected](/media/docs/grafana-cloud/k8s/side-menu.png)
5. Click **Kubernetes**.
   
   [Kubernetes menu item showing on the main menu](/media/docs/grafana-cloud/k8s/k8s-menu-item.png)

## Search for a Kubernetes object

Click **Search** on the main menu or enter a term in the search box on the main page to navigate to the **Search** page. Here you can find any Kubernetes resource. Enter the name or a partial name into the search box and press Enter. The search results display.

To narrow your search, you can:

- Enter a time range in the time range selector
- Click on any of the filter buttons:
  
  - Clusters
  - Nodes
  - Namespaces
  - Workloads
  - Pods

You can select more than one filter.

[Home page search field and search results page](/media/docs/grafana-cloud/k8s/searchresults.png)

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the search results page](https://play.grafana.org/a/grafana-k8s-app/search?from=now-2d&to=now).

[Try it](https://play.grafana.org/a/grafana-k8s-app/search?from=now-2d&to=now)

## Explore using the Kubernetes structure

Kubernetes Monitoring pages reflect the hierarchy of Kubernetes objects, so you can begin at any level above containers. Main pages include lists of Clusters, namespaces, workloads, and Nodes.

For example, the Cluster main page shows the list of your Clusters. When you click on a Cluster in the list, it opens the Cluster detail page. That page shows the details for the Cluster along with a list of Nodes within that Cluster.

You can continue to drill into a Node and see the list of Pods for that Node, all the way to the container level.

[Navigating from lists to detail pages](/media/docs/grafana-cloud/k8s/k8s-structure.png)

There are also main pages for Cluster configuration as well as managing alerts, cost, and efficiency.

You can navigate from the Cluster detail page to the list of workloads or namespaces in that Cluster.

[Navigating from Cluster detail page to namespace and workload lists](/media/docs/grafana-cloud/k8s/ClusterWorkloadNamespace_buttons.png)

For additional navigation tips, refer to [Navigation tips for Kubernetes Monitoring](#navigation-tips).

## Start with high-level snapshot

The **Kubernetes Overview** page gives you a high-level view of your Clusters, usage, and alerts. This page brings to the forefront key data about your infrastructure.

### Refine counts of Kubernetes objects and navigate to them

Adjust the time range and filter by Cluster and namespace to narrow and include historical data for:

- Clusters, Nodes, namespaces, workloads, Pods, and containers
- Deployed container images

[Filtering by Cluster for object count](/media/docs/grafana-cloud/k8s/cluster-selector.png)

After filtering, click the Clusters, Nodes, namespaces, or workloads you want to navigate to.

[Jumping from counter on overview page to Node list](/media/docs/grafana-cloud/k8s/jump-to-node-list-from-count.png)

> Note
> 
> The Overview page calculation uses the most recent data point within your selected time range. The rest of Kubernetes Monitoring also includes objects which are no longer active. For example, a Node can be active and then not active many times throughout a given time range. Therefore, you may see a discrepancy between the count on the Overview page and the count on the list page.

### Find usage spikes

Use the time range selector to focus on a time period while looking for patterns or spikes in CPU and memory usage in your Clusters. When spikes occur:

1. Zoom in on the graph to narrow the time selection.
   
   [Zooming in on graph to change time range](/media/docs/grafana-cloud/k8s/zoom-graph.png)
2. Hover over and click the peak of the spike to see the percentage of use compared to capacity. In the following example, the spike shows 46.5% of CPU usage compared to capacity.
   
   [Hovering over spike to show Cluster link](/media/docs/grafana-cloud/k8s/spike-cluster-link.png)
3. Click the **View** link to view the Cluster. The Cluster detail page shows the time range you set when zooming in on the graph.
   
   [Cluster detail page](/media/docs/grafana-cloud/k8s/detail-cluster-page.png)
   
   You can continue by sorting the list of Nodes in this Cluster by highest CPU usage to investigate the issue causing the spike.

### Review and drill into alerts

1. Sort the **Firing Since** column of alerts to focus on either the most current or the oldest alerts that are firing.
2. Click the container or Pod name related to the alert to jump directly to the detail page.
   
   [List of container alerts](/media/docs/grafana-cloud/k8s/select-container.png)
   
   [Container detail page](/media/docs/grafana-cloud/k8s/detail-container-alert.png)

## Use Grafana Assistant

Click on the Grafana Assistant icon in any panel that contains it to learn more about the panel. This feature opens the Assistant window where you can ask additional questions about the panel itself or your data displayed in the panel.

[Grafana Assistant icon being hovered over for more information](/media/docs/grafana-cloud/k8s/assistant-icon.png)

On any detail page, click **Investigate with Assistant** to open the Assistant window and learn about the Kubernetes object you are currently looking at. You can ask additional questions about your objects and kick off a deep investigation if necessary.

[Investigate button](/media/docs/grafana-cloud/k8s/investigate-button.png)

## Manage alerts

View and [respond to all Kubernetes-related alerts](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/respond-to-alerts/) from the **Alerts** page and the [Kubernetes Overview](#review-and-drill-into-alerts) page.

You can also:

- Manage preconfigured [alerting rules](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/respond-to-alerts/#manage-alerts)
- [Copy a preconfigured alert](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/respond-to-alerts/#copy-a-preconfigured-alert)
- [Create a new alert](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/respond-to-alerts/#create-an-alert)

## Analyze costs

On the [**Cost** page](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/manage-costs/k8s-costs/#cost-overview), use the **Overview** and **Savings** tabs to gain an understanding what Kubernetes is costing and how you can save. You can see the cost of each item in any list view as well as on the detail pages.

## Understand efficiency and resource use

Kubernetes Monitoring gives you several ways to understand and [optimize resource usage and efficiency](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/optimize-resource-usage/) across your fleet. You can surface urgent issues with the **Health** page, compare average and maximum resource usage on detail pages, and identify stranded or unused resources.

### Cluster health

Start with the [**Health** page](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/optimize-resource-usage/cluster-health/) to get a live snapshot of active risks and efficiency issues across all your Clusters. The [**Risks** tab](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/optimize-resource-usage/cluster-health/risks/) surfaces availability, stability, and infrastructure problems. The [**Efficiency** tab](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/optimize-resource-usage/cluster-health/efficiency/) highlights missing resource requests or limits, and flags containers where CPU or memory requests far exceed actual usage.

### Resource usage across the app

Optimize resource usage by:

- Correlating between average and maximum resource usage to understand performance and troubleshoot stability issues.
- Observe resource usage for each Kubernetes object.
- Discover any stranded resources in your fleet.

[List of namespaces showing CPU and memory average and maximum data](/media/docs/grafana-cloud/k8s/namespace-list-mar-2025.png)

Throughout Kubernetes Monitoring, resource usage statistics are available for Kubernetes objects.

### CPU and memory tabs

On any detail page you can view an overview of CPU and memory usage. You can also click the **CPU** tab or the **Memory** tab to view more correlated usage information. For example, the **CPU** and **Memory** tabs on the Cluster detail page show:

- Requests compared to capacity
- Usage compared to capacity
- Usage compared to requests for Nodes and namespaces in the Cluster
  
  [CPU tab of Cluster detail page](/media/docs/grafana-cloud/k8s/cpu-tab-cluster.png)
  
  [Memory tab of Cluster detail page](/media/docs/grafana-cloud/k8s/memory-tab.png)

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the CPU tab for a namespace](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/monitoring/cpu?var-datasource=grafanacloud-play-prom&from=now-2d&to=now&timezone=utc&var-loki=grafanacloud-play-logs&refresh=1m).

[Try it](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/monitoring/cpu?var-datasource=grafanacloud-play-prom&from=now-2d&to=now&timezone=utc&var-loki=grafanacloud-play-logs&refresh=1m)

### GPU tabs

View GPU utilization panels on the GPU tabs of Cluster and Node detail pages to answer questions like:

- Are the Nvidia GPUs inside my Cluster appropriately utilized in relation to tensor cores, encoders, and decoders?
- Are workloads getting and using the GPU resources that have been made available to me?

[GPU tab of Cluster detail page](/media/docs/grafana-cloud/k8s/gpu-panels.png)

## Track persistent storage metrics

Graphs in the storage tab on the Cluster, Namespace, Workload, Node, and Pod detail pages show how [persistent volume (PV) storage](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) changes over a specific time range. You can gain insight into:

- [Storage classes](https://kubernetes.io/docs/concepts/storage/storage-classes/) of persistent volume claims (PVC)
- Volume bytes of the requested PVC, which compares requests, data capacity, and usage
- Volume [inodes](https://www.redhat.com/en/blog/inodes-linux-filesystem), comparing capacity with usage
- The [status phase](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#phase) of the PV and PVC, including the [binding](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#binding) of the PVC request
- Throughput to understand how much data is being read and written per second
- IOPS (Input/Output Operations per Second) to understand how many read and write operations are being performed per second

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the **Storage** tab for a Pod](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/play-backends/deployment/cert-manager/cert-manager-9647b459d-6dk64/storage?var-datasource=grafanacloud-play-prom&from=now-12h&to=now&timezone=utc&refresh=1m&var-loki=grafanacloud-play-logs).

[Try it](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/play-backends/deployment/cert-manager/cert-manager-9647b459d-6dk64/storage?var-datasource=grafanacloud-play-prom&from=now-12h&to=now&timezone=utc&refresh=1m&var-loki=grafanacloud-play-logs)

The PV status on the Pod details page indicates the relationship between persistent volumes and Pods, and also shows the name of the volume, which can change over time.

[Graph of PVC storage classes for a Cluster](/media/docs/grafana-cloud/k8s/pvd-storage-class-cluster.png)

[Graph of PVC volume bytes for namesplaces in a Cluster](/media/docs/grafana-cloud/k8s/pvc-volume-bytes-namespace.png)

[Graphs of throughput and IOPS for a namespace and by workloads within the namespace](/media/docs/grafana-cloud/k8s/wkload-storage-thruput-iops.png)

## Learn what’s predicted

CPU and memory prediction can help you ensure resources are available during spikes in usage, as well as help you decrease the amount of unused resources due to over provisioning. To use prediction tools, first enable [the Machine Learning plugin](/docs/grafana-cloud/alerting-and-irm/machine-learning/).

The following buttons are available in various views. Click them to show a prediction for Clusters, namespaces, workloads, Nodes, Pods, and containers. The time range you select must be at least two hours to use these prediction tools:

- **Predict Mem Usage**: Shows a predictive graph for memory usage one week in the future. Calculations are based on metrics from the previous week.
- **Predict CPU**: Shows a predictive graph for CPU usage one week in the future. Calculations are based on metrics from the previous week.
  
  [Predict CPU button](/media/docs/grafana-cloud/k8s/preduct-cpu-button.png)
  
  [Predictions for Pod CPU Usage](/media/docs/grafana-cloud/k8s/predict-cpu-use.png)

## Review Pod count

On every Workload detail page, you can use the **Pod count** panel to examine the Pod count over a time range you select. Pod count shows how many instances of a workload are running at any point in time. This matters because scaling events directly affect availability, performance, and cost.

To learn more, refer to [Gain insight with Pod count](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/optimize-resource-usage/pod-count/).

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the Workload detail page](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/appenv-grafana-play-cluster/ditl-demo-prod/daemonset/alloy-alloy-logs?var-datasource=grafanacloud-play-prom&from=now-24h&to=now&timezone=utc&refresh=1m).

[Try it](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/appenv-grafana-play-cluster/ditl-demo-prod/daemonset/alloy-alloy-logs?var-datasource=grafanacloud-play-prom&from=now-24h&to=now&timezone=utc&refresh=1m)

## Detect outlier Pod CPU usage

You can identify any Pods that have CPU usage different from other Pods. For any multi-Pod workload, go to the workload detail page, and review the information in the Overview tab. If there is a Pod in the workload that is an outlier for CPU usage, it is indicated in the **outliers by CPU** field. Click the link to open Explore and discover the outlier Pod.

[Clickable message on workload detail page showing a CPU outlier Pod](/media/docs/grafana-cloud/k8s/outlier-pod.png)

## Uncover energy usage

On any detail page, click the **Energy** tab to view the energy usage of:

- Workloads and namespaces
- Clusters
- Nodes
- Pods
- Containers

[Energy usage for workloads in a namespace for 24 hours](/media/docs/grafana-cloud/k8s/energy-for-namespace.png)

When you [configure Kubernetes Monitoring](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/helm-chart-config/#select-features-and-enter-cluster-information) to gather energy metrics, [Kepler](https://sustainable-computing.io/) exposes and gathers metrics, and Alloy collects these metrics.

Energy metrics are separated into these categories:

- Package, [including CPU cores](https://sustainable-computing.io/design/metrics/)
- DRAM (memory)
- GPU
- Other
- Total (the sum of all categories)

## Analyze historical data

Select a time range to see your historical data for any time frame you choose. As you navigate from page to page, the time range remains the same for period you set until you change it again.

As an example, the **Pod optimization** section of the Pod detail page shows a time range over several hours. You can use this to understand the historical pattern of CPU usage and memory usage.

[Pod optimization view on Pod detail page](/media/docs/grafana-cloud/k8s/screenshot-pod-optimization.png)

Zoom into an area of any graph on the detail pages to narrow the time range selector even further. The time range remains selected until you click **Back to default**.

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [this workload details page set for the last 2 days](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/appenv-grafana-play-cluster/ditl-demo-prod/daemonset/alloy-alloy-receiver?var-datasource=grafanacloud-play-prom&from=now-2d&to=now&timezone=utc&refresh=1m).

[Try it](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/appenv-grafana-play-cluster/ditl-demo-prod/daemonset/alloy-alloy-receiver?var-datasource=grafanacloud-play-prom&from=now-2d&to=now&timezone=utc&refresh=1m)

## Monitor cron jobs and other job types

You can monitor manual jobs and scheduled (cron) jobs. Use the main menu to find and select **All jobs**. Use the **Cronjobs** and **Jobs** lists to view jobs across all Clusters and Namespaces, based on the time range you choose in the time range selector. You can view:

- A color-coded status indicator for each job
- How jobs are distributed and where jobs are placed across the infrastructure
- For cron jobs:
  
  - Last succeeded, to verify jobs are completing successfully
  - Last scheduled compared to succeeded, to view any gaps that reveal failed or skipped executions
- For manual jobs, Pods/completions to track when the job was run

[Cronjobs list](/media/docs/grafana-cloud/k8s/all-jobs-page.png)

To further investigate a job, click the job name to open its detail page.

[Detail page of a job, open to the **Overview** tab](/media/docs/grafana-cloud/k8s/job-detail-page.png)

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the Jobs list](https://play.grafana.org/a/grafana-k8s-app/navigation/all-jobs?from=now-7d&to=now&timezone=utc&refresh=1m&var-cluster=appenv-grafana-play-cluster&var-cluster=play-db-cluster).

[Try it](https://play.grafana.org/a/grafana-k8s-app/navigation/all-jobs?from=now-7d&to=now&timezone=utc&refresh=1m&var-cluster=appenv-grafana-play-cluster&var-cluster=play-db-cluster)

On the job detail page, the **Overview** tab contains:

- Status, start time, end time, Pod status phase, logs, and events
- CPU and memory usage, to identify any over or under provisioning as well spot any gradual increases that indicate memory leaks
- Container logs for debugging failed runs
- Events for identifying error messages or unexpected behavior
- Runs table to track success/failure patterns over time, and understand duration and completion

You can further explore each job’s **CPU** and **Memory** tabs for greater insight.

## Find deleted Kubernetes objects

You can find deleted Clusters, namespaces, workloads, Nodes, Pods, and containers to understand what occurred in the past. To do so, set the time range selector to a past time period.

The following example shows a time range of the previous 30 days with some Nodes that show no data (also colored in white text). When you click on a Node with no data, you can learn when the Node expired.

[Node details page showing Node expiration](/media/docs/grafana-cloud/k8s/node-not-running.png)

> Note
> 
> Grafana Cloud has a default 30-day limit for queries. If your Kubernetes object was deleted 30 days beyond the current date, use the time range selector to choose a specific 30-day time frame in the past.

## Access Nodes in Cloud provider accounts

You can navigate to the EC2 dashboard for Nodes managed by AWS from Kubernetes Monitoring. For example:

1. Find the EC2 Node by go to **Search** to search for the Node name.
2. In the search results, click the Node name to open the Node detail page.
3. On the far right-hand side of the screen, open the AWS drop-down to see the link to the EC2 instance.
   
   [AWS drop-down menu](/media/docs/grafana-cloud/k8s/aws-dropdown.png)
4. Click the instance link to open the [EC2 overview](/docs/grafana-cloud/monitor-infrastructure/monitor-cloud-provider/aws/cloudwatch-metrics/metric-dashboards/aws-ec2-dashboard/#drill-into-instance-detail). Here you can find AWS-specific metadata and other data provided by Cloudwatch metrics.
   
   [Dashboard for AWS EC2 instance](/media/docs/grafana-cloud/k8s/aws-ec2-dash.png)
5. To return to the Kubernetes Monitoring view of the Node, click **Back to Kubernetes Node**.

## Discover non-standard workloads

You can find non-standard workloads, including:

- Argo Rollouts
- Strimzi Pod sets
- Unmanaged (or [static](https://kubernetes.io/docs/tasks/configure-pod-container/static-pod/)) Pods
- CronJobs
- Bare Pods

Navigate to the Workloads main page, and filter the Type column.

[Filtering for workload type](/media/docs/grafana-cloud/k8s/workload-type-sort.png)

## View network bandwidth and saturation

Use the network panels to understand when bandwidth limits are causing network saturation, which can lead to dropped packets. On any detail page for Cluster, namespace, workload, Node, or Pod, click the **Network** tab to view:

- Network Bandwidth Rx/Tx: Shows the rate of received and transmitted bytes
- Network Saturation Rx/Tx dropped packets: Shows rate of received and transmitted packets dropped
- Network Bandwidth and Network Saturation by Node, workload, or Pod: Shows the bandwidth and saturation by object
  
  [Network bandwidth and saturation panels for a Cluster](/media/docs/grafana-cloud/k8s/network-panels-for-cluster.png)

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the **Network tab** of this namespace details page](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/kube-system/network?var-datasource=grafanacloud-play-prom&from=now-7d&to=now&timezone=utc&var-loki=grafanacloud-play-logs&refresh=1m).

[Try it](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/kube-system/network?var-datasource=grafanacloud-play-prom&from=now-7d&to=now&timezone=utc&var-loki=grafanacloud-play-logs&refresh=1m)

## View logs and events

[Logs Drilldown](/docs/grafana-cloud/visualizations/simplified-exploration/logs/) is embedded within Kubernetes Monitoring. Instead of going through logs for every Pod, you start with a service (the logical unit your team owns). Logs Drilldown aggregates logs for that service so you can immediately spot whether the problem is one Pod or the whole service. That tells you whether to fix a Pod or the infrastructure.

From Cluster to container, you can click on the **Logs** or **Events** tab of any detail page to analyze logs without writing complex queries. For more information, refer to [Labels and Fields](/docs/grafana-cloud/visualizations/simplified-exploration/logs/labels-and-fields/) and [Log Patterns](/docs/grafana-cloud/visualizations/simplified-exploration/logs/patterns/).

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the **Logs** tab of this namespace details page](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/monitoring/logs-drilldown?var-datasource=grafanacloud-play-prom&from=now-2d&to=now&refresh=1m&var-loki=grafanacloud-play-logs&timezone=utc&logs-k8s-var-ds=grafanacloud-logs&logs-k8s-patterns=%5B%5D&pageSlug=logs&logs-k8s-var-all-fields=&logs-k8s-userDisplayedFields=false&logs-k8s-displayedFields=%5B%22error%22%5D&logs-k8s-urlColumns=%5B%5D&logs-k8s-visualizationType=%22logs%22&logs-k8s-prettifyLogMessage=false&logs-k8s-sortOrder=%22Descending%22&logs-k8s-wrapLogMessage=false).

[Try it](https://play.grafana.org/a/grafana-k8s-app/navigation/namespace/play-db-cluster/monitoring/logs-drilldown?var-datasource=grafanacloud-play-prom&from=now-2d&to=now&refresh=1m&var-loki=grafanacloud-play-logs&timezone=utc&logs-k8s-var-ds=grafanacloud-logs&logs-k8s-patterns=%5B%5D&pageSlug=logs&logs-k8s-var-all-fields=&logs-k8s-userDisplayedFields=false&logs-k8s-displayedFields=%5B%22error%22%5D&logs-k8s-urlColumns=%5B%5D&logs-k8s-visualizationType=%22logs%22&logs-k8s-prettifyLogMessage=false&logs-k8s-sortOrder=%22Descending%22&logs-k8s-wrapLogMessage=false)

## Resolve issues with built-in tools

Navigate easily from Kubernetes Monitoring to other capabilities in Grafana Cloud to analyze, troubleshoot, and solve issues.

### Start an automated diagnostic

From a Pod, Cluster, namespace, or workload detail page, you can begin an automated investigation by clicking [**Run Sift investigation**](/blog/2024/02/21/ai-powered-diagnostics-for-incident-response-new-sift-features-in-grafana-irm/). Sift performs a set of [automated system checks](/docs/grafana-cloud/alerting-and-irm/machine-learning/manage/sift/#sift-checks), and surfaces potential issues in your Kubernetes environment. It then works to [identify the root cause](/blog/2023/09/14/announcing-sift-automated-system-checks-for-faster-incident-response-times-in-grafana-cloud/#how-sift-works-to-identify-the-root-cause-of-an-incident) of an incident.

[Opening a Sift investigation for a namespace](/media/docs/grafana-cloud/k8s/sift-investigate.png)

### Access root cause analysis tool

> Note
> 
> To access root cause analysis tools, enable [Knowledge Graph](/docs/grafana-cloud/knowledge-graph/) on your stack. If Knowledge Graph is not enabled, Kubernetes Monitoring displays a banner with a link to enable it.

You can take troubleshooting deeper by understanding relationships between components and what is occurring between them. Within Kubernetes Monitoring, access [RCA Workbench](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/workbench/) to perform [root cause analysis](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/workbench/#use-the-timeline-to-perform-root-cause-analysis).

Access the RCA Workbench by any of these methods:

- Select the box to the left of the list item, and click **Compare clusters in RCA workbench**.
  
  [Selecting a Cluster to compare for root cause analysis](/media/docs/grafana-cloud/k8s/compare-cluster.png)
- Hover over the ring icon beside the component, and click **Add to RCA Workbench**.
  
  [Hovering over the ring icon](/media/docs/grafana-cloud/k8s/ring-workbench.png)
- Click the checkbox in the row to view all entities connected to a cluster in the Knowledge Graph, then click **Add to RCA Workbench**.

You can also go directly to the connections view in Knowledge Graph to [view connections between entities](/docs/grafana-cloud/knowledge-graph/troubleshoot-infra-apps/workbench/#view-an-entity-graph). Click the **Insights** button, then and click **Conected entities**.

[Hovering over the Insights button](/media/docs/grafana-cloud/k8s/insights2.png)

### Jump to the application layer

On the detail page for a Pod or workload, click **View application layer**, then **Go to Application Observability** to navigate directly to more data, such as the [service health](/docs/grafana-cloud/monitor-applications/application-observability/manual/service/#service-health).

[Navigating directly to the Application Observability app](/media/docs/grafana-cloud/k8s/app-observability.png)

To return to Kubernetes Monitoring, click the browser back button.

### View queries to troubleshoot with Explore

To further query data, use any of the **Explore** buttons available throughout the interface (such as **Explore namespaces** or **Explore alerts**). You see a view that provides additional query tools for [troubleshooting](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/troubleshooting/#view-the-query-with-explore).

[Raw metrics](/media/docs/grafana-cloud/k8s/explore-query-view.png)

### Use debug metrics

For any panel, you can open [Debug Metrics](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/troubleshooting/#debug-metrics-tool) to see the metrics used for the panel.

## Navigate to traces

If you choose to enable traces when you configure Kubernetes Monitoring, you can easily click to see them.

1. Click the main menu icon.
2. Click **Explore**.
3. Choose the Tempo data source.
4. With the **TraceQL** tab selected, enter your search query.
5. Click **Run query**.
   
   A table of traces appears.
6. Click a trace to see the detail.

[View traces](/media/docs/grafana-cloud/k8s/screenshot-k8smon-traces.png)

> Note
> 
> If you use Istio service mesh and traces don’t appear, verify that the Kubernetes Service port name for the OpenTelemetry Collector gRPC endpoint (the value of `spec.ports[].name`) is either `grpc` or starts with `grpc-`. For more information, refer to [Traces missing with Istio service mesh](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/troubleshooting/#traces-missing-with-istio-service-mesh).

## Manage configuration

If you have the `admin` role, you can [manage the configuration](/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/configuration/manage-configuration/) of Kubernetes Monitoring by working with:

- Data source choices
- Alerts
- Integration installations
- Optional custom log queries
- Configuration instructions for Grafana Kubernetes Monitoring Helm chart to deploy, configure, and keep it up to date

## Access more information

Click the documentation links on a page to find more information about what you’re viewing.

## Navigation tips

Here are some tips and shortcuts for getting around in Kubernetes Monitoring.

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on [the **Kubernetes Monitoring Overview** page](https://play.grafana.org/a/grafana-k8s-app/home?from=now-2d&to=now&refresh=1m&var-cluster=%24__all&var-namespace=%24__all).

[Try it](https://play.grafana.org/a/grafana-k8s-app/home?from=now-2d&to=now&refresh=1m&var-cluster=%24__all&var-namespace=%24__all)

### Jump between main pages

From any main page, click the icon beside the page title to see the menu of all main pages. Then click the page you want to open.

[Clicking next to the page title to reveal navigation menu](/media/docs/grafana-cloud/k8s/jump-navigation.png)

### Dock the main menu

To keep the main navigation open:

1. Click the Grafana logo menu icon.
2. Click the dock menu icon to keep the main menu open.

[Hovering over the dock menu icon](/media/docs/grafana-cloud/k8s/dock-menu-icon.png)

### Filter, sort, and set the time range

Use filters and sorting, along with the time range selector, to target the data you want.

[Filtering for a namespace](/media/docs/grafana-cloud/k8s/namespace-filter.png)

### Jump to main lists

From the counts on the Kubernetes Overview home page, click **All** to see that component’s list of items in your Kubernetes fleet.

[Clicking the **All** link from the **Kubernetes Overview** page to see a list of all Clusters](/media/docs/grafana-cloud/k8s/home-page-jump.png)

### Control app refresh

You can control the automatic refresh interval of the GUI as well as disable the auto refresh.

[Menu for controlling automatic refresh and refresh interval](/media/docs/grafana-cloud/k8s/app-refresh-control.png)

### Use color cues

Throughout the views in Kubernetes Monitoring, you see color used as an additional means of indicating status or condition. For example, sometimes text is a different color for Pod status:

[Color coding](/media/docs/grafana-cloud/k8s/pod-color-code.png)

Expand table

| Text      | Color  | Comments                                                                                |
|-----------|--------|-----------------------------------------------------------------------------------------|
| Failed    | Red    | Failed Pod                                                                              |
| Running   | Green  | Healthy Pod                                                                             |
| Running   | Red    | Pod is failing to start                                                                 |
| Succeeded | Green  | Job Pod successfully run                                                                |
| Unknown   | White  | Pod status is unknown                                                                   |
| Waiting   | Yellow | Pod is waiting because of startup, such as Pod initializing or container creating       |
| Waiting   | Red    | Pod is waiting because of a problem, such as crash loop back off or image pull back off |

For more information on Pod status, refer to the [Kubernetes documentation on Pod lifecycle](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase).

The following table describes the color indicators for resource capacity and the state of resource usage:

Expand table

| Usage Colors | Usage             | Comments                                                                |
|--------------|-------------------|-------------------------------------------------------------------------|
| Green        | 60-90% of maximum | This is the ideal state of resource usage.                              |
| Yellow       | Below 60%         | Low usage percentages indicate that the item might be over provisioned. |
| Red          | 90%+              | Your resource usage is close to or above its configured capacity.       |
