Grafana Cloud

Explore your infrastructure with Kubernetes Monitoring

Kubernetes Monitoring visualizes and displays your infrastructure data so you can begin examining and analyzing it to make informed decisions. You can use the views in Kubernetes Monitoring to:

  • Carefully examine your data to evaluate the health of Kubernetes infrastructure components.
  • Use controls to narrow your focus on specific data.
  • View color indicators to understand status and condition at a glance.
  • Discover issues with resource usage to make informed decisions about efficiency and costs.

See the issues at a glance

The main Kubernetes page displays a “crow’s nest” snapshot of issues for the data source chosen in the drop-down menu.

Kubernetes main page

At this view, you can see the graphed counts for Clusters, nodes, Pods, and containers, as well as:

  • Pods that have been in a non-running state for 15 minutes or more
  • Node issues with CPU and memory usage over 90% for over 5 minutes, and disks exceeding capacity of over 90%
  • Persistent volumes that have been using over 90% of their capacity

You can sort the columns and with one click go to Pod, Cluster, node, and namespace views for greater detail.

Drill down into data

As you delve into your data, you can navigate from Nodes through to Pods by clicking the Cluster navigation menu item and choose any of the following tabs:

  • Clusters
  • Namespaces
  • Workloads
  • Nodes

Cluster view

If you choose to enable traces when you configure Kubernetes Monitoring, you can easily click to see them.

  1. Click the main menu icon.

  2. Click Explore.

  3. Choose the Tempo data source.

  4. With the TraceQL tab selected, enter your search query.

  5. Click Run query.

    A table of traces appears.

  6. Click a trace to see the detail.

View traces

Filter for data

Use the controls on each page to further specify the data you want to view and examine. For example, choose a data source and use filters to refine what you want to analyze. Click the heading of a list column to sort it. Click underlined items within lists to further explore details about the item.

Use color cues

Throughout the views in Kubernetes Monitoring, you will see color as an additional means of indicating status or condition. For example, sometimes text will be a different color for pod status:

Text colors

RunningGreenHealthy pod
RunningRedPod failing to start
FailedRedFailed pod
UnknownGreyPod status unknown
SucceededGreenJob pod successfully run

For more information on pod statuses, see the Kubernetes documentation on pod lifecycle.

The following table describes the color indicators for resource capacity and the state of resource usage:

Usage Bar ColorUsageComments
Green60-90% of maximumThis is the ideal state of resource usage.
YellowBelow 60%Low usage percentages indicate that the Node might be over-provisioned.
Red90-100%Your Node resource is dangerously close to maximum capacity.

Node color coding

Understand efficiency and resource use

The Efficiency view gives you a correlation between CPU, memory, and storage use for Clusters, Nodes, and namespaces. The list of Clusters indicates each Cluster’s resource usage. You can use this data to:

  • Understand performance and troubleshoot stability issues by correlating between average and maximum resource usage.
  • Observe resource usage per Cluster and per Cloud provider.
  • Discover any stranded resources in your fleet.

Efficiency view

Learn what’s predicted

CPU and memory prediction can help you ensure resources are available during spikes in resource usage and help you decrease the amount of unused resources due to over provisioning.

To use prediction tools, first enable the Machine Learning plugin.

The following buttons are available in various views. Click them to show a prediction for either a Node or a workload:

  • Predict Mem Usage: Shows a predictive graph for memory usage one week in the future. Calculations are based on metrics from the previous week.
  • Predict CPU: Shows a predictive graph for CPU usage one week in the future. Calculations are based on metrics from the previous week.

Predictive graph

Within a workload view, you can click the Detect Outlier CPU Usage amongst Pods button to identify a pod that has CPU usage different from the other pods.

Outlier message

Click Explore this query in the Machine Learning plugin to view the raw data. Here you can adjust parameters and see a more detailed graph of the findings.

Outlier raw data

Analyze costs

You can use the Cost view to help you understand the costs of resources that are consumed by your Kubernetes infrastructure, and identify areas of potential savings.

Cost view

There are two tabs within this view that present infrastructure costs:

  • Overview: Costs segmented by the different cloud service providers
  • Savings: Costs segmented by each resource, and the costs associated with unused resources

Hover over the circled i icon for more information on each calculation.

i icon

View raw metrics

To further query data, use any of the Explore buttons available throughout the views (such as Explore namespaces or Explore alerts). You will see a view that provides additional query tools.

Raw metrics

Start out with ready-to-use dashboards

Click Dashboards to view and access out-of-the-box dashboards, including:

  • Resource consumption dashboards, to help identify when consumption is higher than requests or to understand consumption over time:

    • Multi-Cluster
    • Cluster
    • Namespace (by Pods)
    • Namespace (Workloads)
    • Node (Pods)
    • Pod
    • Workload
  • Cluster dashboards, to gain insight into Cluster operation:

    • (Home) Kubernetes Integration, the primary dashboard that displays high-level Cluster resource usage and configuration status
    • Efficiency, for understanding resource utilization in your Kubernetes fleet, and to help you reduce cost and optimize performance
    • Kubernetes / Kubelet, for understanding kubelet performance on your Nodes. This dashboard provides useful summary metrics, such as number of running Pods, containers, and volumes on a given Node.
    • Kubernetes / Persistent Volumes, for understanding usage of your configured PersistentVolumes

Manage alerts

Kubernetes Monitoring includes pre-configured alerting rules that trigger alerts. The Alerts view shows alert rules by namespace or group and the status of any alerts that have been triggered by that rule. For more information on alerts, see Configure alerting.

Alert rule detail

You can silence some default alerts temporarily as a useful strategy when you are investigating alerts.

Manage configuration

If you have the admin role, you can manage the configuration of Kubernetes Monitoring by working with:

  • Data source choices
  • Pre-configured dashboards and alerts
  • Integration installations
  • Optional custom log queries
  • Configuration instructions for Grafana Agent Flow mode to deploy, configure, and keep it up to date.

For more information, refer to Configure Kubernetes Monitoring.