Explore your infrastructure with Kubernetes Monitoring
Kubernetes Monitoring offers visualization and analysis tools for you to:
- Carefully examine your data to evaluate the health, efficiency, and cost of Kubernetes infrastructure components.
- Analyze historical data as well as predictions created with machine learning.
- Discover issues with resource usage to make informed decisions about efficiency and costs.
Navigate to Kubernetes Monitoring
- Navigate to your Grafana Cloud portal.
- In the upper left, click the main menu icon.
- Click the menu item Kubernetes to view the main page.
See the issues at a glance
The main Kubernetes page displays a snapshot of issues that exceed specific thresholds for the data source chosen in the drop-down menu.
At this view, you can see the graphed counts for Clusters, Nodes, Pods, and containers, as well as:
- Pods that have been in a non-running state for 15 minutes or more
- Node issues with CPU and memory usage over 90% for over 5 minutes, and disks exceeding capacity of over 90%
- Persistent Volumes that have been using over 90% of their capacity
Sort the columns, and with one click, go to Pod, Cluster, Node, and namespace views for greater detail.
Drill down into data
As you delve into your data, navigate from Nodes through to Pods by clicking the Cluster navigation menu item and choose any of the following tabs:
Analyze historical data
Select a time range to see your historical data for any time frame you choose. As you navigate from page to page, the time range shows for the period you set until you change it again.
As an example, the Pod optimization section of the Pod detail page shows a time range over several hours. You can use this to understand the historical pattern of CPU usage and memory usage.
Learn what’s predicted
CPU and memory prediction can help you ensure resources are available during spikes in resource usage and help you decrease the amount of unused resources due to over provisioning. To use prediction tools, first enable the Machine Learning plugin.
The following buttons are available in various views. Click them to show a prediction for either a Node or a workload:
- Predict Mem Usage: Shows a predictive graph for memory usage one week in the future. Calculations are based on metrics from the previous week.
- Predict CPU: Shows a predictive graph for CPU usage one week in the future. Calculations are based on metrics from the previous week.
Within a workload view, click the Detect Outlier CPU Usage amongst Pods button to identify a Pod that has CPU usage different from the other Pods.
Click Explore this query in the Machine Learning plugin to view the raw data. Here you can adjust parameters and see a more detailed graph of the findings.
Understand efficiency and resource use
The Efficiency page shows a correlation between CPU, memory, and storage use for Clusters, Nodes, and namespaces. The list of Clusters indicates each Cluster’s resource usage. Use this data to:
- Understand performance and troubleshoot stability issues by correlating between average and maximum resource usage.
- Observe resource usage per Cluster and per Cloud provider.
- Discover any stranded resources in your fleet.
You can also explore resource usage at a Pod and container level.
Use the Cost page to help you understand the costs of resources consumed by your Kubernetes infrastructure, and identify areas of potential savings.
There are two tabs that present infrastructure costs:
- Overview: Costs segmented by the different cloud service providers
- Savings: Costs segmented by each resource, and the costs associated with unused resources
Hover over the circled i icon for more information on each calculation.
View out-of-the-box dashboards
Kubernetes Monitoring includes preconfigured dashboards. For more details, refer to Use dashboards.
Filter for data
Use the controls on each page to further specify the data you want to view and examine. For example, choose a data source and use filters to refine what you want to analyze. Click the heading of a list column to sort it. Click underlined items within lists to further explore details about the item.
Use color cues
Throughout the views in Kubernetes Monitoring, you see color used as an additional means of indicating status or condition. For example, sometimes text is a different color for Pod status:
|Running||Red||Pod failing to start|
|Unknown||Grey||Pod status unknown|
|Succeeded||Green||Job Pod successfully run|
For more information on Pod status, refer to the Kubernetes documentation on Pod lifecycle.
The following table describes the color indicators for resource capacity and the state of resource usage:
|Usage Bar Color||Usage||Comments|
|Green||60-90% of maximum||This is the ideal state of resource usage.|
|Yellow||Below 60%||Low usage percentages indicate that the Node might be over-provisioned.|
|Red||90-100%||Your Node resource is dangerously close to maximum capacity.|
View raw metrics
To further query data, use any of the Explore buttons available throughout the interface (such as Explore namespaces or Explore alerts). You see a view that provides additional query tools.
Kubernetes Monitoring includes pre-configured alerting rules that trigger alerts. The Alerts view shows alert rules by namespace or group and the status of any alerts that have been triggered by that rule. For more information on alerts, see Configure alerting.
You can silence some default alerts temporarily as a useful strategy when you are investigating alerts.
Navigate to traces
If you choose to enable traces when you configure Kubernetes Monitoring, you can easily click to see them.
Click the main menu icon.
Choose the Tempo data source.
With the TraceQL tab selected, enter your search query.
Click Run query.
A table of traces appears.
Click a trace to see the detail.
If you have the
admin role, you can manage the configuration of Kubernetes Monitoring by working with:
- Data source choices
- Prebuilt dashboards and alerts
- Integration installations
- Optional custom log queries
- Configuration instructions for Grafana Kubernetes Monitoring Helm chart to deploy, configure, and keep it up to date.
For more information, refer to Configure Kubernetes Monitoring.