Menu
Grafana Cloud

Grafana Kubernetes Monitoring

Grafana Kubernetes Monitoring provides:

  • A cohesive set of tools for you to monitor your Kubernetes fleet both proactively to achieve optimal resource utilization and to troubleshoot and detect issues early. You can collect and store in Grafana Cloud:
    • Metrics
    • Pod logs
    • Cluster events
    • Traces
    • Cost metrics
  • Out-of-the-box visualizations of your data in a centralized toolkit. Other components that come out of the box include cost monitoring insight, preconfigured alerts, alert rules, recording rules, and machine-learning predictions.
  • One platform for comprehensive monitoring and visibility. As you analyze the health of your Clusters, Pods, and containers, and perform troubleshooting, you remain within the same Kubernetes Monitoring app. This makes analysis and troubleshooting more efficient and effective, reducing mean time to resolution.

Guidance and help

Not sure where to go?
Answer a few questions and Grot will show you a helpful next step.

Kubernetes challenges

You face many challenges with Kubernetes when you are trying to perform:

  • Reactive problem solving: When you react to issues without a monitoring system, you must guess the probable sources, then use trial and error to test fixes. This increases the workload, especially for newcomers who are unfamiliar with the system. The more difficult it is to troubleshoot, the more downtime increases and the more burden is placed on experienced staff.
  • Proactive management: Resources that are not optimized can significantly impact both budget and performance. If a fleet is underprovisioned, the performance and availability of applications and services are at serious risk. Underprovisioning leads to applications that lag, under perform, are unstable, or do not function. Fleets that are overprovisioned run the risk of wasting money and resources, becoming costly.

Reactive response benefits

Quick issue identification, alerts, data correlation, and other features are built into Kubernetes Monitoring to streamline troubleshooting.

Priority issues at forefront

The home page of Kubernetes Monitoring provides a snapshot of all items, and any associated alerts, that are beyond set thresholds for:

  • Node CPU and memory usage
  • Node disk and Persistent Volume capacity
  • Pods in a non-running state and the cause for this state

You can immediately drill into issues that require attention and quick start your problem solving.

Home page showing issues exceeding thresholds
Snapshot of issues over threshold limits and associated alerts

Real-time alerts

Real-time alerts inform you as soon as problems begin, so you can prevent users from being the first to find an issue. Alerts and alert rules are available out of the box, so you can customize alerts.

Logs and metrics correlation

As with metrics, Kubernetes doesn’t provide a native storage solution for logs. Logs help you identify the root cause of an issue more quickly, making troubleshooting without logs incomplete. The best way to discover reproduction steps and work towards discovering root causes is often through accessing logs from your application and Kubernetes components.

Kubernetes Monitoring uses Grafana Loki as its log aggregator, built to be compatible with Prometheus. Since Loki and Prometheus share labels, you can correlate metrics and logs to identify root causes faster. This also removes the burden of setting up and configuring multiple technologies.

Proactive management benefits

The features of Kubernetes Monitoring enable you to create and implement a strategy for proactive management.

Early error detection

Log files, traces, and performance metrics provide visibility into what’s happening in your Cluster. When you proactively monitor your Kubernetes Clusters, you have advanced warning of usage spikes and increasing error rates. With early error detection, you can solve issues before they affect your users.

Cost visibility and management

Nodes, load balancers, and Persistent Volumes usually incur a separate cost from your provider. Kubernetes Monitoring provides visibility into these costs to manage and reduce costs.

Resource efficiency management

The insight you gain into real-world Cluster usage means you can monitor your Kubernetes Cluster for resource contention or uneven application Pod distribution across your Nodes. Then you can make simple scheduling adjustments, such as setting affinities and anti-affinities, to significantly enhance performance and reliability.

You can mitigate the threat of an unstable infrastructure by monitoring resource usage of CPU, RAM, and storage:

  • Ensure that there are enough allocated resources. This decreases the risk of Pod or container eviction as well as undesired performance of your microservices and applications.
  • Eliminate unused or stranded resources.

Node health and resource management

Kubernetes Nodes are the machines in a Cluster that run your applications and store your data. Unhealthy Nodes can cause exponential errors, unhealthy Deployments, or other events that may be frequent or infrequent. There are two types of Nodes in a Kubernetes Cluster:

  • Worker Nodes: To host your application containers, grouped as Pods
  • Control plane Nodes: To run the services that are required to control the Kubernetes Cluster

While Clusters act as the spine of your Kubernetes architecture, Nodes form the vertebrae. A healthy backbone of efficient Nodes is required for your Clusters to stay up and your applications to run fast. To ensure you have healthy nodes, one solution is expensive autoscalers that purchase increasingly more cloud resources and span more Nodes. That gives you seemingly endless resources, but doesn’t pinpoint where the actual issues are. With Kubernetes Monitoring, you can take a data-driven approach for better capacity utilization, resource management, and Pod placement.

Resource usage forecasts

You need to know the number of Nodes, load balancers, and Persistent Volumes that are currently deployed in your cloud account. Each of these objects will usually incur a separate cost from your provider. Auto-scaling architectures let you adapt in real-time to changing demand, but this can also create rapidly spiraling costs.

By looking at a prediction of resource usage, you have more information to forecast how much of a particular resource will be required for a given project or activity. This insight allows for better planning, budgeting, and cost estimations.

Get started

Get started easily by using a quick configuration process with Grafana Kubernetes Monitoring Helm chart. When you configure with the Helm chart, there’s no manual set up, and the chart includes automatic updates for all components that it installs.

Other configuration methods

There are many available methods you can use to configure Kubernetes Monitoring for your infrastructure data. Refer to Configure manually for infrastructure.

To configure data about an application running in Kubernetes, refer to Configure manually for applications.

What is out of the box

Kubernetes Monitoring out-of-the-box features include:

  • Drilling into your data using a single interface
  • Kubernetes snapshot of issues view, showing a snapshot of Cluster, Node, Pod, and container counts, as well as any issues that need attention and the alerts associated with them
  • Efficiency data throughout the app for examining and refining resource usage
  • Cost data globally available for analyzing and managing your infrastructure costs and potential savings
  • Embedded Alerts page where you can respond to and troubleshoot alerts, and select alerting rules to view and customize them
  • Curated set of metrics to assist in preventing cardinality issues
  • Recording rules to increase the speed of dashboard queries and the evaluation of alerting rules
  • Kubernetes-focused, prebuilt Cardinality dashboard to assist in identifying Clusters and namespaces that produce high cardinality metrics
  • Direct access to the Application Observability app from workload and Pod detail pages