The Kubernetes integration allows you to monitor and alert on resource usage and operations in a Kubernetes cluster. Kubernetes is an open-source container orchestration system that automates software container deployment, scaling, and management.
The Kubernetes integration provides the following:
- Preconfigured manifests for deploying Grafana Agent and kube-state-metrics to your clusters.
- 10 Grafana dashboards to drill into resource usage and cluster operations, from the multi-cluster level down to individual containers and Pods.
- A set of recording rules to speed up dashboard queries.
- A set of alerting rules to alert on conditions. For example: Pods crash looping and Pods getting stuck in “not ready” status.
- A preconfigured (optional) allowlist of metrics referenced in the above dashboards, recording rules, and alerting rules to reduce your active series usage while still giving you visibility into core cluster metrics.
- Kubernetes Events in Grafana Cloud Loki (beta). To learn more about this feature, please see Kubernetes Events.
We are also heavily indebted to the open source kubernetes-mixin project, from which the dashboards, recording rules, and alerting rules have been derived. We will continue to contribute bug fixes and new features upstream.
Installing the Kubernetes Integration
Navigate to your Hosted Grafana instance. You can find this in the Cloud Portal.
From here, click on Integrations and Connections (lightning bolt icon) in the menu on the left, and then Walkthrough.
Click on Kubernetes and then Install Integration.
You’ll see a series of instructions for deploying the following:
- Grafana Agent single-replica StatefulSet that will collect Prometheus metrics & Kubernetes events from objects in your K8s cluster
- Kube-state-metrics Helm chart (which deploys a KSM Deployment and Service, along with some other access control objects)
- Grafana Agent DaemonSet that will collect logs from Pods in your K8s cluster
Reinstall or upgrade the Integration
The engineering team regularly pushes updates to Grafana Cloud’s Kubernetes integration and the Grafana Agent. You must update these components manually to take advantage of any updates to Grafana Dashboards, alerting & recording rules, and new Grafana Agent features. To learn how to do this, please see Updating the Kubernetes Integration.
Scraping Application Pod Metrics
By default, the Kubernetes integration only scrapes cAdvisor (1 per node), kubelet (1 per node), and kube-state-metrics (1 replica by default) endpoints. You can also configure Grafana Agent to scrape application Prometheus metrics, like those available at the standard
/metrics endpoint on Pods.
For example, to add a scrape job targeting all
/metrics endpoints on your cluster Pods, add the following to the bottom of your Agent scrape config:
. . . - job_name: "kubernetes-pods" kubernetes_sd_configs: - role: pod relabel_configs: # Example relabel to scrape only pods that have # "example.io/should_be_scraped = true" annotation. # - source_labels: [__meta_kubernetes_pod_annotation_example_io_should_be_scraped] # action: keep # regex: true # # Example relabel to customize metric path based on pod # "example.io/metric_path = <metric path>" annotation. # - source_labels: [__meta_kubernetes_pod_annotation_example_io_metric_path] # action: replace # target_label: __metrics_path__ # regex: (.+) # # Example relabel to scrape only single, desired port for the pod # based on pod "example.io/scrape_port = <port>" annotation. # - source_labels: [__address__, __meta_kubernetes_pod_annotation_example_io_scrape_port] # action: replace # regex: ([^:]+)(?::\d+)?;(\d+) # replacement: $1:$2 # target_label: __address__ # Expose Pod labels as metric labels - action: labelmap regex: __meta_kubernetes_pod_label_(.+) # Expose Pod namespace as metric namespace label - source_labels: [__meta_kubernetes_namespace] action: replace target_label: namespace # Expose Pod name as metric name label - source_labels: [__meta_kubernetes_pod_name] action: replace target_label: pod
This config adds every defined Pod container port to Agent’s scrape targets, discovered using Agent’s Kubernetes service discovery mechanism. You can optionally uncomment the relevant sections to customize the metrics path (the default is
/metrics), specify a sample port, or use Pod annotations to declaratively specify which targets Agent should scrape in your Pod manifests. To learn more please see the examples in the official Prometheus project repo.
To learn more about configuring the Agent, please see Configure Grafana Agent from the Agent docs. To learn more about available
kubernetes_sd_configs roles (we used the
pod role here) and labels, please see kubernetes_sd_config from the Prometheus docs.
You can update your Agent configuration by modifying the ConfigMap and redeploying it. After editing the above ConfigMap, deploy it into your cluster using
kubectl apply -f:
kubectl apply -f your_configmap.yaml
Next, restart the Agent to pick up the config changes:
kubectl rollout restart deployment/grafana-agent
Configured Scrape Targets
By default, Agent scrapes the following targets:
- cAdvisor, which is present on each node in your cluster and emits container resource usage metrics like CPU usage, memory usage, and disk usage
- kubelet, which is present on each node and emits metrics specific to the kubelet process like
- kube-state-metrics, which runs as a Deployment and Service in your cluster and emits Prometheus metrics that track the state of objects in your cluster, like Pods, Deployments, DaemonSets, and more
The default ConfigMap configures an allowlist to drop all metrics not referenced in the Kubernetes integration dashboards, alerts, and recording rules. You can optionally modify this allowlist, replace it with a denylist (by using the
drop directive), omit it entirely, or move it to the
remote_write level so that it applies globally to all configured scrape jobs. To learn more, please see Reducing Prometheus metrics usage with relabeling.
The Kubernetes integration includes 10 dashboards out of the box to help you get started with observing and monitoring your Kubernetes clusters and their workloads. This set includes the following:
(Home) Kubernetes Integration, the principal integration dashboard that displays high-level cluster resource usage and integration configuration status.
Kubernetes / Compute Resources (7 dashboards), a set of dashboards to drill down into resource usage by the following levels:
- Namespace (by Pods)
- Namespace (by workloads, like Deployments or DaemonSets)
- Pods and containers
- Workloads (Deployments, DaemonSets, StatefulSets, etc.)
These dashboards contain links to sub-objects, so you can jump from cluster, to Namespace, to Pod, etc.
Kubernetes / Kubelet, a dashboard that helps you understand Kubelet performance on your Nodes, and provides useful summary metrics like number of running Pods, Containers, and Volumes on a given Node .
Kubernetes / Persistent Volumes, a dashboard that helps you understand usage of your configured PersistentVolumes.
The Kubernetes integration includes the following alerting rules to help you get up and running with Grafana Cloud alerts and get notified when issues arise with your clusters and their workloads:
Kubernetes system alerts:
Kubernetes resource usage alerts:
Kubernetes app alerts:
To learn more, see the upstream Kubernetes-Mixin’s Kubernetes Alert Runbooks page. You can update alerting rule links to point your own runbooks in these pre-configured alerts programmatically, using a tool like cortex-tools or grizzly. To learn more, see Prometheus and Loki rules with mimirtool and Alerts
The Kubernetes integration includes the following recording rules to speed up dashboard queries and alerting rule evaluation:
Note that recording rules may emit time series with the same metric name, but different labels.
To learn how to modify these programmatically, please see Prometheus and Loki rules with mimirtool.
Metrics and Usage
By default, the Kubernetes integration configures allowlists using Prometheus relabel_config blocks. To learn more about
write_relabel_configs, please see Reducing Prometheus metrics usage with relabeling.
These allowlists drop any metrics not referenced in integration dashboards, rules, and alerts. To omit or modify the allowlists, modify the corresponding
metric_relabel_configs blocks in your Agent configuration. To learn more about analyzing and controlling active series usage, please consult Control Prometheus metrics usage.
Grafana Cloud billing is based on billable series. To learn more about the pricing model, please consult Active series and DPM.
Default active series usage varies depending on your Kubernetes cluster size (number of Nodes) and running workloads (number of Pods, containers, Deployments, etc.).
When testing on a cloud provider’s Kubernetes offering, the following active series usage was observed:
- 3 node cluster, 17 running pods, 31 running containers: 3.8k active series
- The only Pods deployed into the cluster were Grafana Agent and kube-state-metrics. The rest were running in the
kube-systemNamespace and managed by the cloud provider
- The only Pods deployed into the cluster were Grafana Agent and kube-state-metrics. The rest were running in the
- From this baseline, active series usage roughly increased by:
- 1000 active series per additional Node
- 75 active series per additional Pod (vanilla Nginx Pods were deployed into the cluster)
These are very rough guidelines and results may vary depending on your Cloud provider or Kubernetes version. Note also that these figures are based on the scrape targets configured above, and not additional targets like application metrics, API server metrics, and scheduler metrics.
The default setup instructions will roll out a Grafana Agent DaemonSet to collect logs from all pods running in your cluster and ship these to Grafana Cloud Loki.
The Kubernetes integration will soon support out-of-the-box configuration for shipping traces to your hosted Tempo endpoint. In the meantime, you can get started shipping traces to Grafana Cloud by following the Agent Traces Quickstart. This will roll out a single-replica Agent Deployment that will receive Traces and
remote_write these to Grafana Cloud.
Grafana Cloud Integrations
Grafana Cloud will soon support integrations on Kubernetes as a platform, like the Linux Node Integration (node-exporter), Redis integration, MySQL integration, and many more. In the meantime, to use embedded Agent exporters/integrations, you must configure them manually. To learn how to do this, please see integrations_config from the Agent docs.
. . . - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: integrations/node-exporter kubernetes_sd_configs: - namespaces: names: - NODE_EXPORTER_NAMESPACE_HERE role: pod relabel_configs: - action: keep regex: prometheus-node-exporter.* source_labels: - __meta_kubernetes_pod_label_app - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: instance - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: false
This will instruct Agent to scrape any Pod with the label
app=prometheus-node-exporter.* (the value is a regular expression). The Helm chart configures this label by default, but if you modify the Chart’s
values.yaml file or any other set defaults, you may have to adjust this scrape job accordingly. To learn more, please see this helpful set of examples.
Correlating data across Metrics, Logs, and Traces
Documentation for configuring correlation across metrics, logs and traces, specifically for Kubernetes workloads is forthcoming.
In the interim period, please consult Intro to monitoring Kubernetes with Grafana Cloud. Note that this video was published prior to the release of the current version of the Kubernetes integration, so some concepts may differ slightly.
Kubernetes events (beta)
Kubernetes events provide helpful logging information emitted by K8s cluster controllers. Grafana Agent contains an embedded integration that watches for event objects in your clusters, and ships them to Grafana Cloud Loki for long-term storage and analysis. To enable this feature and ship K8s events to Cloud Loki, please see Kubernetes Events. The integration setup instructions will enable this feature by default in the Grafana Agent StatefulSet.
Related Grafana Cloud resources
How to set up and visualize synthetic monitoring at scale with Grafana Cloud
Learn how to use Kubernetes, Grafana Loki, and Grafana Cloud’s synthetic monitoring feature to set up your infrastructure's checks in this GrafanaCONline session.
Using Grafana Cloud to drive manufacturing plant efficiency
This GrafanaCONline session tells how Grafana helps a 75-year-old manufacturing company with product quality and equipment maintenance.