Menu
Grafana Cloud

Explore your infrastructure with Kubernetes Monitoring

Kubernetes Monitoring offers visualization and analysis tools for you to:

  • Carefully examine your data to evaluate the health, efficiency, and cost of Kubernetes infrastructure components.
  • Analyze historical data as well as predictions created with machine learning.
  • Discover issues with resource usage to make informed decisions about efficiency and costs.
  1. Navigate to your Grafana Cloud portal.
  2. In the upper left, click the main menu icon.
  3. Click the menu item Kubernetes to view the main page.

See the issues at a glance

The main Kubernetes page displays a snapshot of issues that exceed specific thresholds for the data source chosen in the drop-down menu.

Kubernetes Monitoring main page
Kubernetes Monitoring main page

At this view, you can see the graphed counts for Clusters, Nodes, Pods, and containers, as well as:

  • Pods that have been in a non-running state for 15 minutes or more
  • Node issues with CPU and memory usage over 90% for over 5 minutes, and disks exceeding capacity of over 90%
  • Persistent Volumes that have been using over 90% of their capacity

Sort the columns, and with one click, go to Pod, Cluster, Node, and namespace views for greater detail.

Drill down into data

As you delve into your data, navigate from Nodes through to Pods by clicking the Cluster navigation menu item and choose any of the following tabs:

  • Clusters
  • Namespaces
  • Workloads
  • Nodes
Cluster view
Cluster view

Analyze historical data

Select a time range to see your historical data for any time frame you choose. As you navigate from page to page, the time range shows for the period you set until you change it again.

Time picker
Time picker

As an example, the Pod optimization section of the Pod detail page shows a time range over several hours. You can use this to understand the historical pattern of CPU usage and memory usage.

Pod optimization view on Pod detail page
Pod optimization view on Pod detail page

Learn what’s predicted

CPU and memory prediction can help you ensure resources are available during spikes in resource usage and help you decrease the amount of unused resources due to over provisioning. To use prediction tools, first enable the Machine Learning plugin.

The following buttons are available in various views. Click them to show a prediction for either a Node or a workload:

  • Predict Mem Usage: Shows a predictive graph for memory usage one week in the future. Calculations are based on metrics from the previous week.
  • Predict CPU: Shows a predictive graph for CPU usage one week in the future. Calculations are based on metrics from the previous week.
Predictions for Node CPU Usage
Predictions for Node CPU Usage

Within a workload view, click the Detect Outlier CPU Usage amongst Pods button to identify a Pod that has CPU usage different from the other Pods.

Outlier message
Outlier message

Click Explore this query in the Machine Learning plugin to view the raw data. Here you can adjust parameters and see a more detailed graph of the findings.

Outlier raw data
Outlier raw data

Understand efficiency and resource use

The Efficiency page shows a correlation between CPU, memory, and storage use for Clusters, Nodes, and namespaces. The list of Clusters indicates each Cluster’s resource usage. Use this data to:

  • Understand performance and troubleshoot stability issues by correlating between average and maximum resource usage.
  • Observe resource usage per Cluster and per Cloud provider.
  • Discover any stranded resources in your fleet.
Efficiency page
Efficiency page

You can also explore resource usage at a Pod and container level.

Analyze costs

Use the Cost page to help you understand the costs of resources consumed by your Kubernetes infrastructure, and identify areas of potential savings.

Costs page
Costs page

There are two tabs that present infrastructure costs:

  • Overview: Costs segmented by the different cloud service providers
  • Savings: Costs segmented by each resource, and the costs associated with unused resources

Hover over the circled i icon for more information on each calculation.

i icon
i icon

View out-of-the-box dashboards

Kubernetes Monitoring includes preconfigured dashboards. For more details, refer to Use dashboards.

Filter for data

Use the controls on each page to further specify the data you want to view and examine. For example, choose a data source and use filters to refine what you want to analyze. Click the heading of a list column to sort it. Click underlined items within lists to further explore details about the item.

Use color cues

Throughout the views in Kubernetes Monitoring, you see color used as an additional means of indicating status or condition. For example, sometimes text is a different color for Pod status:

Color coding
Color coding
TextColorComments
RunningGreenHealthy Pod
RunningRedPod failing to start
FailedRedFailed Pod
UnknownGreyPod status unknown
SucceededGreenJob Pod successfully run

For more information on Pod status, refer to the Kubernetes documentation on Pod lifecycle.

The following table describes the color indicators for resource capacity and the state of resource usage:

Usage Bar ColorUsageComments
Green60-90% of maximumThis is the ideal state of resource usage.
YellowBelow 60%Low usage percentages indicate that the Node might be over-provisioned.
Red90-100%Your Node resource is dangerously close to maximum capacity.
Node color coding
Node color coding

View raw metrics

To further query data, use any of the Explore buttons available throughout the interface (such as Explore namespaces or Explore alerts). You see a view that provides additional query tools.

Raw metrics
Raw metrics

Manage alerts

Kubernetes Monitoring includes pre-configured alerting rules that trigger alerts. The Alerts view shows alert rules by namespace or group and the status of any alerts that have been triggered by that rule. For more information on alerts, see Configure alerting.

Alert rule detail
Alert rule detail

You can silence some default alerts temporarily as a useful strategy when you are investigating alerts.

If you choose to enable traces when you configure Kubernetes Monitoring, you can easily click to see them.

  1. Click the main menu icon.

  2. Click Explore.

  3. Choose the Tempo data source.

  4. With the TraceQL tab selected, enter your search query.

  5. Click Run query.

    A table of traces appears.

  6. Click a trace to see the detail.

View traces
View traces

Manage configuration

If you have the admin role, you can manage the configuration of Kubernetes Monitoring by working with:

  • Data source choices
  • Prebuilt dashboards and alerts
  • Integration installations
  • Optional custom log queries
  • Configuration instructions for Grafana Kubernetes Monitoring Helm chart to deploy, configure, and keep it up to date.

For more information, refer to Configure Kubernetes Monitoring.