Billing and usageControlling Prometheus metrics usageReducing Kubernetes metrics usage

Reducing Kubernetes metrics usage

This guide describes some specific methods you can use to control your usage when shipping Prometheus metrics from a Kubernetes cluster.

Default deployments of preconfigured Prometheus-Grafana-Alertmanager stacks like kube-prometheus scrape and store tens of thousands of active series when launched into your cluster. A vanilla deployment of kube-prometheus configured to remote_write to Grafana Cloud will count towards roughly ~50,000 active series of your metrics usage. Using the methods in this guide, you can reduce this to a baseline of ~4,000 core active series, and build up needed metrics from there.

Prerequisites

This guide assumes some familiarity with Kubernetes concepts and assumes that you have a Prometheus deployment running inside of your cluster, configured to remote_write to Grafana Cloud. To learn how to configure remote_write to ship Prometheus metrics to Cloud, please see Metrics — Prometheus.

Steps to modify Prometheus’s configuration vary depending on how you deployed Prometheus into your cluster. This guide will use a default kube-prometheus installation with Prometheus Operator to demonstrate the metrics reduction methods. The steps in this guide can be modified to work with Helm installations of Prometheus, vanilla Prometheus Operator deployments, and other custom Prometheus deployments.

Deduplicating metrics data sent from high-availability Prometheus pairs

This section shows you how to deduplicate samples sent from high-availability Prometheus deployments.

By default, kube-prometheus deploys 2 replicas of Prometheus for high-availability, shipping duplicates of scraped metrics to remote storage. Grafana Cloud can deduplicate metrics, reducing your metrics usage and active series by 50% with a small configuration change. To learn more about Grafana Cloud deduplication, see Sending data from multiple high-availability Prometheus instances. This section will implement this configuration change with the kube-prometheus stack. Steps are similar for any Prometheus Operator-based deployment.

Begin by navigating into the manifests directory of the kube-prometheus code repository.

Locate the manifest file for the Prometheus Custom Resource, prometheus-prometheus.yaml. Prometheus Custom Resources are created and defined by Prometheus Operator, a sub-component of the kube-prometheus stack. To learn more about Prometheus Operator, please see the prometheus-operator GitHub repository.

Scroll to the bottom of prometheus-prometheus.yaml and append the following three lines:

replicaExternalLabelName: "__replica__"
externalLabels:
  cluster: "your_cluster_identifier"

The replicaExternalLabelName parameter changes the default prometheus_replica external label name to __replica__ . Grafana Cloud uses the __replica__ and cluster external labels to identify replicated series to deduplicate. The value for __replica__ corresponds to a unique Pod name for the Prometheus replica.

To learn more about external labels and deduplication, please see Sending data from multiple high-availability Prometheus instances. To learn more about these parameters and the Prometheus Operator API, consult API Docs from the Prometheus Operator GitHub repository.

For a Prometheus HA deployment without Prometheus Operator, it’s sufficient to create a unique __replica__ label for each HA Prometheus instance, and a cluster label shared across both HA instances in your Prometheus configuration.

After saving and rolling out these changes, you should see your active series usage decrease by roughly 50%. It may take some time for data to propagate into your Billing and Usage Grafana dashboards, but you should see results fairly quickly in the Ingestion Rate (DPM) panel.

You can also drastically reduce metrics usage by keeping a limited set of metrics to ship to Grafana Cloud, instead of all metrics scraped by kube-prometheus in its default configuration.

Filtering and keeping kubernetes-mixin metrics (allowlisting)

This section shows you how to keep a limited set of core metrics to ship to Grafana Cloud, storing the rest locally.

The Prometheus Monitoring Mixin for Kubernetes contains a curated set of Grafana dashboards and Prometheus alerts to gain visibility into and alert on your cluster’s operations. The Mixin dashboards and alerts are designed by DevOps practitioners who’ve distilled their experience and knowledge managing Kubernetes clusters into a set of reusable core dashboards and alerts.

By default, kube-prometheus deploys Grafana into your cluster, and populates it with a core set of kubernetes-mixin dashboards. It also sets up the alerts and recording rules defined in the Kubernetes Mixin. To reduce your Grafana Cloud metric usage, you can selectively ship metrics essential for populating kubernetes-mixin dashboards to Grafana Cloud. These metrics will then be available for long-term storage and analysis, with all other metrics stored locally in your cluster Prometheus instances.

In this guide, we’ve extracted metrics found in kubernetes-mixin dashboards. You may want to include other metrics, such as those found in the Mixin alerting Runbook which are the same as those found in the Mixin alerts. A forthcoming guide will show you how to programmatically set up the entire kubernetes-mixin stack on Grafana Cloud.

To begin allowlisting metrics, navigate into the manifests directory of the kube-prometheus code repository.

Locate the manifest file for the Prometheus Custom Resource, prometheus-prometheus.yaml. Prometheus Custom Resources are created and defined by Prometheus Operator, a sub-component of the kube-prometheus stack. To learn more about Prometheus Operator, please see the prometheus-operator GitHub repository.

Scroll to the bottom of prometheus-prometheus.yaml and append the following to your existing remoteWrite configuration:

remoteWrite:
- url: "https://prometheus-us-central1.grafana.net/api/prom/push"
  basicAuth:
    username:
      name: your_grafanacloud_secret
      key: your_grafanacloud_secret_username_key
    password:
      name: your_grafanacloud_secret
      key: your_grafanacloud_secret_password_key
  writeRelabelConfigs:
  - sourceLabels:
    - "__name__"
    regex: "apiserver_request_total|kubelet_node_config_error|kubelet_runtime_operations_errors_total|kubeproxy_network_programming_duration_seconds_bucket|container_cpu_usage_seconds_total|kube_statefulset_status_replicas|kube_statefulset_status_replicas_ready|node_namespace_pod_container:container_memory_swap|kubelet_runtime_operations_total|kube_statefulset_metadata_generation|node_cpu_seconds_total|kube_pod_container_resource_limits_cpu_cores|node_namespace_pod_container:container_memory_cache|kubelet_pleg_relist_duration_seconds_bucket|scheduler_binding_duration_seconds_bucket|container_network_transmit_bytes_total|kube_pod_container_resource_requests_memory_bytes|namespace_workload_pod:kube_pod_owner:relabel|kube_statefulset_status_observed_generation|process_resident_memory_bytes|container_network_receive_packets_dropped_total|kubelet_running_containers|kubelet_pod_worker_duration_seconds_bucket|scheduler_binding_duration_seconds_count|scheduler_volume_scheduling_duration_seconds_bucket|workqueue_queue_duration_seconds_bucket|container_network_transmit_packets_total|rest_client_request_duration_seconds_bucket|node_namespace_pod_container:container_memory_rss|container_cpu_cfs_throttled_periods_total|kubelet_volume_stats_capacity_bytes|kubelet_volume_stats_inodes_used|cluster_quantile:apiserver_request_duration_seconds:histogram_quantile|kube_node_status_allocatable_memory_bytes|container_memory_cache|go_goroutines|kubelet_runtime_operations_duration_seconds_bucket|kube_statefulset_replicas|kube_pod_owner|rest_client_requests_total|container_memory_swap|node_namespace_pod_container:container_memory_working_set_bytes|storage_operation_errors_total|scheduler_e2e_scheduling_duration_seconds_bucket|container_network_transmit_packets_dropped_total|kube_pod_container_resource_limits_memory_bytes|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate|storage_operation_duration_seconds_count|node_netstat_TcpExt_TCPSynRetrans|node_netstat_Tcp_OutSegs|container_cpu_cfs_periods_total|kubelet_pod_start_duration_seconds_count|kubeproxy_network_programming_duration_seconds_count|container_network_receive_bytes_total|node_netstat_Tcp_RetransSegs|up|storage_operation_duration_seconds_bucket|kubelet_cgroup_manager_duration_seconds_count|kubelet_volume_stats_available_bytes|scheduler_scheduling_algorithm_duration_seconds_bucket|kube_statefulset_status_replicas_current|code_resource:apiserver_request_total:rate5m|kube_statefulset_status_replicas_updated|process_cpu_seconds_total|kube_pod_container_resource_requests_cpu_cores|kubelet_pod_worker_duration_seconds_count|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubeproxy_sync_proxy_rules_duration_seconds_bucket|container_memory_usage_bytes|workqueue_adds_total|container_network_receive_packets_total|container_memory_working_set_bytes|kube_resourcequota|kubelet_running_pods|kubelet_volume_stats_inodes|kubeproxy_sync_proxy_rules_duration_seconds_count|scheduler_scheduling_algorithm_duration_seconds_count|apiserver_request:availability30d|container_memory_rss|kubelet_pleg_relist_interval_seconds_bucket|scheduler_e2e_scheduling_duration_seconds_count|scheduler_volume_scheduling_duration_seconds_count|workqueue_depth|:node_memory_MemAvailable_bytes:sum|volume_manager_total_volumes|kube_node_status_allocatable_cpu_cores"
    action: "keep"

The first chunk of this configuration defines remote_write parameters like authentication and the Cloud Metrics Prometheus endpoint URL to which Prometheus ships scraped metrics. To learn more about remote_write, please see the Prometheus docs. To learn about the API implemented by Prometheus Operator, please see the API Docs from the Prometheus Operator GitHub repository.

The writeRelabelConfigs section instructs Prometheus to check the __name__ meta-label (the metric name) of a scraped time series, and match it against the regex defined by the regex parameter. This regex contains a list of all metrics found in the kubernetes-mixin dashboards.

The keep action instructs Prometheus to “keep” these metrics for shipping to Grafana Cloud, and drop all others. Note that this configuration applies only to the remote_write section of your Prometheus configuration, so Prometheus will continue to store all scraped metrics locally.

If you have additional metrics you’d like to keep, you can append them to the regex parameter or in an additional relabel_config section.

When you’re done modifying prometheus-prometheus.yaml, save and close the file. Deploy the changes in your cluster using kubectl apply -f or your preferred Kubernetes management tool. You may need to restart or bring up new Prometheus instances to pick up the modified configuration.

After saving and rolling out these changes, you should only be pushing roughly ~4,000 active series. It may take some time for data to propagate into your Billing and Usage Grafana dashboards, but you should see results fairly quickly in the Ingestion Rate (DPM) panel. Any kubernetes-mixin dashboards imported into Grafana Cloud should continue to function correctly.

To test this, you can import a kubernetes-mixin dashboard into Grafana Cloud manually.

Importing a kubernetes-mixin dashboard into Grafana Cloud

Run the following command to get access to the Grafana instance running in your cluster:

kubectl --namespace monitoring port-forward svc/grafana 3000

In your web browser, navigate to http://localhost:3000 and locate the API Server dashboard, which contains panels to help you understand the behavior of the Kubernetes API server.

Click on Share Dashboard.

Share Dashboard

Next, click on Export, then View JSON. Copy the Dashboard JSON to your clipboard.

On Grafana Cloud, log in to Grafana, then to Manage Dashboards. Click on Import and in the Import via panel JSON field, paste in the dashboard JSON you just copied. Then, click Load. Optionally name and organize your dashboard, then hit Import to import it.

apiserver Dashboard

You should see your allowlisted metrics populating the dashboard panels. These metrics and this dashboard will be available in Grafana Cloud for long-term storage and efficient querying across all of your Kubernetes clusters.

You can also reduce metric usage by explicitly dropping high-cardinality metrics in your relabel_config.

Filtering and dropping high-cardinality metrics (denylisting)

You can also selectively drop high-cardinality metrics and labels that you don’t anticipate needing to warehouse in Grafana Cloud.

To analyze your metrics usage and learn how to identify potential high-cardinality metrics and labels to drop, please see Analyzing Prometheus metric usage.

The following sample write_relabel_configs drops a metric called alertmanager_build_info. This is not a high-cardinality metric, and is only used here for demonstration purposes.

write_relabel_configs:
  - source_labels: [__name__]
    regex: "alertmanager_build_info"
    action: drop

This config looks at the __name__ series meta-label, corresponding to a metric’s name, and checks that it matches the regex set in the regex field. If it does, all matched series are dropped. Note that if you add this snippet to the remote_write section of your Prometheus configuration, you will continue to store the metric locally, but prevent it from being shipped to Grafana Cloud.

You can expand this snippet to capture other high-cardinality metrics that you do not wish to ship to Grafana Cloud for long-term storage. Note that this example does not use the Kubernetes Prometheus Operator API and is standard Prometheus configuration.

To learn more about write_relabel_configs, please see relabel_config from the Prometheus docs.

Conclusion

This guide describes three methods for reducing Grafana Cloud metrics usage when shipping metric from Kubernetes clusters:

  • Deduplicating metrics sent from HA Prometheus deployments
  • Keeping “important” metrics
  • Dropping high-cardinality “unimportant” metrics

This guide has purposefully avoided making statements about which metrics are “important” or “unimportant” — this will depend on your use case and production monitoring needs. To learn more about some metrics you may wish to visualize and alert on, please see the Kubernetes Mixin, created by experience DevOps practitioners and contributors to the Prometheus and Grafana ecosystem.

From here, you may wish to execute the Kubernetes Mixin Prometheus recording rules and alerts on Grafana Cloud. To do this, please see Alerts. You will have to expand the scope of allowlisted metrics to include those referenced in recording rules and alerts defined in the Kubernetes Mixin.