Introduction to Kubernetes Monitoring

The multiple and dynamic components of Kubernetes pose significant challenges to any team’s ability to problem solve and proactively manage a fleet.

Reactive problem solving means:

Quick identification of issues
Prioritization of problem severity
Streamlined root cause analysis
An efficent workflow

Proactive management requires:

Preventing issues that cause performance problems
Efficient use of resources and managing their costs
Insight into future usage and cost

Kubernetes Monitoring provides the tools for both reactive and proactive strategies.

Reactive response benefits

Quick issue identification, alerts, data correlation, and other features are built into Kubernetes Monitoring to streamline troubleshooting.

Priority issues at forefront

The Kubernetes Overview page provides a high-level look at counts for Kubernetes objects, CPU and memory usage by Cluster, and firing alerts for containers and Pods. You can filter this view by Clusters and namespaces, then identify issues that require attention to begin your problem solving.

Snapshot of counts, Cluster CPU and memory usage, deployed container images, and firing container alerts

Real-time alerts

Real-time alerts inform you as soon as problems begin. You can jump from alert to runbook for a quick solution, create your own alerts, and copy a built-in alert to customize it.

Logs and metrics correlation

While Kubernetes doesn’t provide a native storage solution for logs, Kubernetes Monitoring uses Grafana Loki as its log aggregator. Since Loki and Prometheus share labels, you can correlate metrics and logs to identify root causes faster without configuring and using multiple technologies.

Proactive management benefits

The features available in Kubernetes Monitoring enable you to create and implement a proactive strategy with a data-driven approach.

Early error detection

You can use built-in alerting for anomolies such as CPU throttling to learn which settings need fine tuning. Network bandwidth and saturation is available by object. The time range selector in Kubernetes Monitoring provides a look into the history of an object, which reveals patterns such as spikes. Outlier Pod detection can uncover Pods with CPU usage differences that may lead to issues.

Cost visibility and management

Nodes, load balancers, and Persistent Volumes usually incur a separate cost from your provider, making it important to keep track of them. Auto-scaling architectures let you adapt in real-time to changing demand, but can lead to rapidly spiraling costs. Kubernetes Monitoring provides visibility into these costs to identify where cost can be reduced. With cost prediction, you can view potential, future costs.

Resource efficiency management

You can mitigate the threat of an unstable infrastructure by monitoring resource usage to:

Ensure that there are enough allocated resources and decrease the risk of Pod eviction, as well as prevent performance degradation of your microservices and applications.
Eliminate unused or stranded resources.

Then you can make scheduling adjustments, such as setting affinities and anti-affinities, to enhance performance and reliability.

Resource usage forecasts

By looking at a prediction of resource usage, you can better forecast for a project or activity.

What is out of the box

The out-of-the-box features that are part of Kubernetes Monitoring include:

The capability to explore and troubleshoot your Kubernetes infrastructure in a single interface
A built-in, easy deployment process using Helm chart that includes the option to use eBPF technology to automatically instrument applications with Grafana Beyla. This retrieves telemetry data for applications on the Cluster without adding SDKs.
Kubernetes Overview, showing a snapshot of Cluster, Node, Pod, and container counts, as well as any issues that need attention and the alerts associated with them
Efficiency data throughout the app for examining and refining resource usage
Cost data globally available for analyzing and managing your infrastructure costs and potential savings
Alerts page where you can respond to and troubleshoot alerts, and select embedded alerting rules to view and duplicate for customization
Energy usage data provided by Kepler
Immediate, in-context access to other Grafana Cloud applications for quicker analysis and resolution, including:
Recording rules to increase the speed of queries and the evaluation of alerting rules
Predictions for CPU and memory usage, powered by machine learning
Curated set of metrics to assist in providing only the metrics you need

Get started

Get started by using a streamlined configuration process with Grafana Kubernetes Monitoring Helm chart. When you configure with the Helm chart, there’s no manual set up, and the chart includes automatic updates for all components that it installs.