Monitor infrastructure

Kubernetes Monitoring

Configure Kubernetes Monitoring

Configure with Helm chart

Configuration steps

Grafana Cloud

Configuration steps for Kubernetes Monitoring with Helm chart

Complete the configuration process on the Cluster configuration tab.

Give it a try using Grafana Play

With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on this Configuration page.

Try it

Stack and platform

Make sure you have met the requirements and enter your stack and platform information.

Before you begin

Make sure you have met the prerequisites required for these configuration steps.

Note
Ensure that you are familiar with the components installed by the Helm chart and how they relate to switching on or off the configuration choices available.

To deploy Kubernetes Monitoring with the Helm chart, you need:

The Admin role to install alerts
A Kubernetes Cluster, environment, or fleet you want to monitor
The kubectl and Helm command-line tools

Appropriate versions of items related to:
- kube-state-metrics: Uses client-go to communicate with Clusters. For Kubernetes client-go version compatibility and any other related details, refer to kube-state-metrics.
- OpenCost: Requires Kubernetes 1.8+ clusters.
- Storage visualizations: Require Helm chart release v1.5.1 or later

Cluster and platform

Complete the following:
- Cluster name: In the Cluster name box, enter the name of your Cluster.
- Namespace: In the Namespace box, replace default with the namespace where you want to deploy your monitoring infrastructure. This is the namespace for Grafana Alloy and other dependencies such as kube-state-metrics
Select the platform you are using.
Click Next.

Monitoring type

Choose the type of monitoring you want to do by selecting:
- Kubernetes Monitoring
- Application Observability
Optionally, select Enable Remote Configuration to use Fleet Management to monitor, configure, and manage your Alloy collectors. You can update your collectors without having to update the Cluster.
To see and customize your configuration details, click Show config details.
Switch on or off the following options for Kubernetes Monitoring.
Note
These options are independent of each other. For example, disabling cost or energy metrics does not disable any other option. Refer to additional information for each option by following the links and review Manage your Kubernetes configuration.
- Cluster metrics on the infrastructure
  
  Node resource usage; metrics about Pod health; Persistent volume usage; and Deployment, StatefulSet, and DaemonSet status.
  Essential for Cluster health monitoring; powers the Kubernetes Overview dashboard; tracks resource utilization and capacity planning; and detects Pod crashes, OOM kills, and so on.
- Cost metrics (costMetrics), which uses OpenCost
  
  Resource cost attribution by namespace, workload, and Pod; CPU and memory cost breakdowns; Cloud provider pricing data integration; and cost efficiency metrics.
  Shows cost data; helps identify expensive workloads; enables FinOps and cost optimization; and tracks spending trends over time.
- Energy metrics (hostMetrics.energyMetrics), which uses Kepler to obtain energy data
- Pod and service metrics by annotation, to collect Prometheus metrics from Pods and Services using annotations that define their scrape target
  
  Automatically discovers running pods and services; dynamically scrapes Prometheus metrics from annotated Pods; detects new workloads without manual configuration; and uses Kubernetes annotations to find metrics endpoints.
  Enable for: zero-configuration metrics collection; automatically monitor new applications as they deploy; support microservices architectures with dynamic scaling; and find application-specific metrics beyond system metrics.
  Disable when: You want to explicitly define every scrape target; you’re concerned about discovering unintended metrics endpoints; your cluster has very strict network policies.
- Prometheus Operator objects, for collecting metrics from PodMonitors, Probes, and ServiceMonitors
  
  Discovers and monitors Prometheus Operator CRDs: ServiceMonitor, which defines how to scrape metrics from Kubernetes services; PodMonitor, which defines how to scrape metrics from Pods; and Probe, which defines blackbox probing of endpoints.
  Enable when: you’re already using Prometheus Operator in your Cluster; you have existing ServiceMonitor/PodMonitor definitions; you want to leverage existing Prometheus configurations; you want to enable migration from Prometheus Operator to Grafana Alloy.
  Disable when: You’re not using Prometheus Operator Objects; you prefer using Grafana Alloy’s native configuration.
- Cluster events
  
  Generated from: the scheduler (assigning Pods to Nodes); kubelet (managing Pods on Nodes); the controller manager (handling scaling, deployments, and so on); the API server (processing requests).
  Troubleshooting scheduling & deployment issues; see why a Pod isn’t starting (no Nodes with enough memory); tracking resource lifecycle changes (Pod created → scheduled → pulled → started → ready → terminated); detecting transient or recurring failures (repeated image pull errors, failed probes, or node taints); auditing Cluster activity (to identify which controller or user triggered changes).
- Node logs, to capture journald logs from the Nodes
  
  Logs from: the operating system (kernel logs, systemd, network drivers); Kubernetes node agents (like kubelet); container runtimes (Docker, containerd); system daemons (journald, syslog).
  Diagnosing node instability (memory exhaustion, CPU throttling, or disk space issues); debugging scheduling or startup failures (when Pods can’t start, the issue may be at the Node level); investigating network or storage problems; determining driver or volume mount failures; auditing system changes (Node reboots, kubelet restarts, or OS updates).
Switch on or off the following options for Application Observability.
- Pod logs, available as podLogsViaLoki (Loki pipeline) or podLogsViaOpenTelemetry (native OTLP format) in the Helm chart values
  
  Standard output (stdout) and standard error (stderr) Pod logs from the processes in the container, such as initialization messages, API request logs, warnings or errors, and application-specific information.
  Enable for: debugging issues, such as when an application crashes or behaves unexpectedly, logs reveal what went wrong; monitoring behavior, such as tracking normal operational messages (startup confirmation, API requests, or job completion); auditing events, to view logs that show what actions were taken by your app or scripts running inside containers; gaining performance insight, such as tracing slow operations or bottlenecks using timestamps and log levels.
- OpenTelemetry receivers
  
  Metrics, logs, and traces from apps sending OTel data.
  Opens Alloy receiver ports to accept telemetry that your apps (or OTel SDK/collector) push to Grafana Cloud.
  Enable for: services using OTel SDKs so you can push traces/metrics to an endpoint; Application Observability (RED metrics, service map, trace correlation) in Grafana Cloud; existing Zipkin-instrumented apps when you want to receive traces from them; ensuring host-hours telemetry, which is required for Application Observability billing.
  Disable when: your apps have no instrumentation at all.
- Application profiling, to collect profiles from applications on the Cluster. In the Helm chart values, the profiling feature has granular toggles for ebpf, java, and pprof, so you can enable only the profilers you need.
  
  Enables continuous profiling using eBPF, Java, and/or pprof; collects CPU flame graphs from running applications; captures function-level performance data; identifies code issues and performance bottlenecks.
  Enable when you want to: find expensive functions in your code; optimize application performance; debug CPU-intensive operations; identify memory allocation patterns.
  Disable when: You don’t need code-level profiling; you’re concerned about profiling overhead (~1-5% CPU); your applications are already well-optimized.
- Zero-code instrumentation to deploy Grafana Beyla for zero-code instrumentation of applications on the Cluster
  Caution
  If you enable instrumentation with Beyla, this may affect your billing due to additional telemetry ingestion.
  
  Correlates Pod metrics with application traces; links infrastructure metrics to application performance; enables unified views of resource usage and request patterns; powers the correlation features in Pod detail views.
  Enable when you want to: See how Pod resource constraints affect application latency; correlate OOM kills with specific requests; understand resource consumption per endpoint; and connect infrastructure issues to user impact.
  Disable when: You only need basic infrastructure monitoring.
- Forward traces to application receivers to forward traces captured by Beyla to the application receiver OTLP endpoints.

Backend and token

Note
The backend installation installs rules that are required for Kubernetes Monitoring to function properly. Recording rules are the source of the workload data in the Kubernetes Monitoring. If you aren’t seeing the workload data, the most likely cause is that the recording rules and alert rules haven’t been installed.

Click Install to install the required, preconfigured alert rules and recording rules.
You can create a new access policy token or use an existing token. Refer to Grafana Cloud Access Policies for more information.
To use an existing token:
1. Click Use an existing token.
2. Paste the token into the Access policy token name box.
To create a new token:
1. Click Create a new token.
2. In the box for Access Policy Token name, enter the name of your token.
3. In the Expiration date box, select an option for the expiration date.
The permission scope for the token appears.
1. Click Next.
2. Click Create token.
The token generates and appears in the token box. This token is automatically copied into the ConfigMap file.
1. Click the copy icon in the token box to copy the token. Make sure to save it in a secure place. It is not shown again.
2. Click Next.

Deployment

Select the method of deployment.
Use the code or files for deployment, following the on-screen instructions.

Helm client

To use the Helm client to deploy the Kubernetes Monitoring Helm chart to the Cluster:

Copy the command.
Paste and run it in your terminal.

Terraform

To use Terraform to deploy the Kubernetes Monitoring Helm chart to the Cluster:

Copy, modify, and save the following files to your Terraform system set up using Terraform:
- provider.tf
- grafana-k8s-monitoring.tf
- vars.tf
Deploy by using the commandsterraform init and terraform apply.

Configure application instrumentation

If you chose to forward traces to application receivers, a list of endpoints appear. In your application that generates metrics, logs, or traces, enter the appropriate OTLP or Zipkin address.

Note
If you change the deployment name to something other than grafana-k8s-monitoring, the endpoint address is updated as well. Be sure to update your applications to point to the correct endpoint.

Click See cluster status to view the status of data collection. Your data becomes populated as the system components begin scraping and sending data to Grafana Cloud. This view shows the health of the different sources of metrics, Pod logs, and Cluster events, as well as any applicable version numbers.

Troubleshoot

Refer to Troubleshooting for any issues that occur after configuration.

Install any integrations

You can use Grafana integrations to monitor the health and status of services and applications running in your Kubernetes clusters.

To install a Kubernetes integration to begin scraping metrics:

From the main menu, navigate to Connections, and filter for Kubernetes.
Select the integration for the service you want to monitor.
Follow the instructions on the screen to copy and use the configuration snippet and install the integration.
After installing an integration, redeploy the configuration using the method you originally used.

Retrieve Helm values

If you installed Kubernetes Monitoring with the Helm CLI, you can retrieve the values for your configuration by using the helm get values command.

Was this page helpful?

Email docs@grafana.com

Help and support

Community

Configuration steps for Kubernetes Monitoring with Helm chart