Grafana Cloud

Monitor resource usage

Kubernetes Monitoring reveals CPU, memory, GPU, storage, and network usage across list views and detail pages. You can view resource usage at any level, from a Cluster down to a container, and compare it against the configured requests and capacity.

Find resource usage in list views

In any list view, you can find, filter, and sort maximum values for CPU and memory usage.

List of namespaces with CPU and memory statistics for each namespace
List of namespaces with CPU and memory statistics for each namespace
Give it a try using Grafana Play
Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on this workloads list page.

CPU and memory tabs

On any detail page you can view an overview of CPU and memory usage. You can also click the CPU tab or the Memory tab to view more correlated usage information. For example, the CPU and Memory tabs on the Cluster detail page show:

Give it a try using Grafana Play
Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on the CPU tab for a namespace.

GPU tabs

View GPU utilization panels on the GPU tabs of Cluster and Node detail pages to answer questions like:

  • Are the Nvidia GPUs inside your Cluster appropriately utilized in relation to tensor cores, encoders, and decoders?
  • Are workloads getting and using the GPU resources that have been made available to you?
GPU tab of Cluster detail page
GPU tab of Cluster detail page

Track persistent storage metrics

Graphs in the storage tab on the Cluster, Namespace, Workload, Node, and Pod detail pages show how persistent volume (PV) storage changes over a specific time range. You can gain insight into:

  • Storage classes of persistent volume claims (PVC)
  • Volume bytes of the requested PVC, which compares requests, data capacity, and usage
  • Volume inodes, comparing capacity with usage
  • The status phase of the PV and PVC, including the binding of the PVC request
  • Throughput to understand how much data is being read and written per second
  • IOPS (Input/Output Operations per Second) to understand how many read and write operations are being performed per second
Give it a try using Grafana Play
Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on the Storage tab for a Pod.

The PV status on the Pod details page indicates the relationship between persistent volumes and Pods, and also shows the name of the volume, which can change over time.

Graph of PVC storage classes for a Cluster
Graph of PVC storage classes for a Cluster
Graph of PVC volume bytes for namespaces in a Cluster
Graph of PVC volume bytes for namespaces in a Cluster

Graphs of throughput and IOPS for a namespace and by workloads within the namespace
Graphs of throughput and IOPS for a namespace and by workloads within the namespace

Review Pod count

On every Workload detail page, you can use the Pod count panel to examine the Pod count over a time range you select. Pod count shows how many instances of a workload are running at any point in time. This matters because scaling events directly affect availability, performance, and cost.

To learn more, refer to Gain insight with Pod count.

Give it a try using Grafana Play
Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on the Workload detail page.

Detect outlier Pod CPU usage

You can identify any Pods that have CPU usage different from other Pods. For any multi-Pod workload, go to the workload detail page, and review the information in the Overview tab. If there is a Pod in the workload that is an outlier for CPU usage, it is indicated in the outliers by CPU field. Click the link to open Explore and discover the outlier Pod.

Clickable message on workload detail page showing a CPU outlier Pod
Clickable message on workload detail page showing a CPU outlier Pod

View network bandwidth and saturation

Use the network panels to understand when bandwidth limits are causing network saturation, which can lead to dropped packets. On any detail page for Cluster, namespace, workload, Node, or Pod, click the Network tab to view:

  • Network Bandwidth Rx/Tx: Shows the rate of received and transmitted bytes
  • Network Saturation Rx/Tx dropped packets: Shows rate of received and transmitted packets dropped
  • Network Bandwidth and Network Saturation by Node, workload, or Pod: Shows the bandwidth and saturation by object
    Network bandwidth and saturation panels for a Cluster
    Network bandwidth and saturation panels for a Cluster
Give it a try using Grafana Play
Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on the Network tab of this namespace details page.