Explore your infrastructure with Kubernetes Monitoring
Kubernetes Monitoring visualizes and displays your infrastructure data so you can begin examining and analyzing it to make informed decisions. You can use the views in Kubernetes Monitoring to:
- Carefully examine your data to evaluate the health of Kubernetes infrastructure components.
- Use controls to narrow your focus on specific data.
- View color indicators to understand status and condition at a glance.
- Discover issues with resource usage to make informed decisions about efficiency and costs.
See the issues at a glance
The main Kubernetes page displays a “crow’s nest” snapshot of issues for the data source chosen in the drop-down menu.
At this view, you can see the graphed counts for Clusters, nodes, Pods, and containers, as well as:
- Pods that have been in a non-running state for 15 minutes or more
- Node issues with CPU and memory usage over 90% for over 5 minutes, and disks exceeding capacity of over 90%
- Persistent volumes that have been using over 90% of their capacity
You can sort the columns and with one click go to Pod, Cluster, node, and namespace views for greater detail.
Drill down into data
As you delve into your data, you can navigate from Nodes through to Pods by clicking the Cluster navigation menu item and choose any of the following tabs:
- Clusters
- Namespaces
- Workloads
- Nodes
Navigate to traces
If you choose to enable traces when you configure Kubernetes Monitoring, you can easily click to see them.
Click the main menu icon.
Click Explore.
Choose the Tempo data source.
With the TraceQL tab selected, enter your search query.
Click Run query.
A table of traces appears.
Click a trace to see the detail.
Filter for data
Use the controls on each page to further specify the data you want to view and examine. For example, choose a data source and use filters to refine what you want to analyze. Click the heading of a list column to sort it. Click underlined items within lists to further explore details about the item.
Use color cues
Throughout the views in Kubernetes Monitoring, you will see color as an additional means of indicating status or condition. For example, sometimes text will be a different color for pod status:
Text | Color | Comments |
---|---|---|
Running | Green | Healthy pod |
Running | Red | Pod failing to start |
Failed | Red | Failed pod |
Unknown | Grey | Pod status unknown |
Succeeded | Green | Job pod successfully run |
For more information on pod statuses, see the Kubernetes documentation on pod lifecycle.
The following table describes the color indicators for resource capacity and the state of resource usage:
Usage Bar Color | Usage | Comments |
---|---|---|
Green | 60-90% of maximum | This is the ideal state of resource usage. |
Yellow | Below 60% | Low usage percentages indicate that the Node might be over-provisioned. |
Red | 90-100% | Your Node resource is dangerously close to maximum capacity. |
Understand efficiency and resource use
The Efficiency view gives you a correlation between CPU, memory, and storage use for Clusters, Nodes, and namespaces. The list of Clusters indicates each Cluster’s resource usage. You can use this data to:
- Understand performance and troubleshoot stability issues by correlating between average and maximum resource usage.
- Observe resource usage per Cluster and per Cloud provider.
- Discover any stranded resources in your fleet.
Learn what’s predicted
CPU and memory prediction can help you ensure resources are available during spikes in resource usage and help you decrease the amount of unused resources due to over provisioning.
To use prediction tools, first enable the Machine Learning plugin.
The following buttons are available in various views. Click them to show a prediction for either a Node or a workload:
- Predict Mem Usage: Shows a predictive graph for memory usage one week in the future. Calculations are based on metrics from the previous week.
- Predict CPU: Shows a predictive graph for CPU usage one week in the future. Calculations are based on metrics from the previous week.
Within a workload view, you can click the Detect Outlier CPU Usage amongst Pods button to identify a pod that has CPU usage different from the other pods.
Click Explore this query in the Machine Learning plugin to view the raw data. Here you can adjust parameters and see a more detailed graph of the findings.
Analyze costs
You can use the Cost view to help you understand the costs of resources that are consumed by your Kubernetes infrastructure, and identify areas of potential savings.
There are two tabs within this view that present infrastructure costs:
- Overview: Costs segmented by the different cloud service providers
- Savings: Costs segmented by each resource, and the costs associated with unused resources
Hover over the circled i icon for more information on each calculation.
View raw metrics
To further query data, use any of the Explore buttons available throughout the views (such as Explore namespaces or Explore alerts). You will see a view that provides additional query tools.
Start out with ready-to-use dashboards
Click Dashboards to view and access out-of-the-box dashboards, including:
Resource consumption dashboards, to help identify when consumption is higher than requests or to understand consumption over time:
- Multi-Cluster
- Cluster
- Namespace (by Pods)
- Namespace (Workloads)
- Node (Pods)
- Pod
- Workload
Cluster dashboards, to gain insight into Cluster operation:
- (Home) Kubernetes Integration, the primary dashboard that displays high-level Cluster resource usage and configuration status
- Efficiency, for understanding resource utilization in your Kubernetes fleet, and to help you reduce cost and optimize performance
- Kubernetes / Kubelet, for understanding kubelet performance on your Nodes. This dashboard provides useful summary metrics, such as number of running Pods, containers, and volumes on a given Node.
- Kubernetes / Persistent Volumes, for understanding usage of your configured PersistentVolumes
Manage alerts
Kubernetes Monitoring includes pre-configured alerting rules that trigger alerts. The Alerts view shows alert rules by namespace or group and the status of any alerts that have been triggered by that rule. For more information on alerts, see Configure alerting.
You can silence some default alerts temporarily as a useful strategy when you are investigating alerts.
Manage configuration
If you have the admin role, you can manage the configuration of Kubernetes Monitoring by working with:
- Data source choices
- Pre-configured dashboards and alerts
- Integration installations
- Optional custom log queries
- Configuration instructions for Grafana Agent Flow mode to deploy, configure, and keep it up to date.
For more information, refer to Configure Kubernetes Monitoring.
Related resources from Grafana Labs


