Reducing Prometheus metrics usage with relabeling
This guide describes several techniques you can use to reduce your Prometheus metrics usage on Grafana Cloud.
Before applying these techniques, ensure that you’re deduplicating any samples sent from high-availability Prometheus clusters. This will cut your active series count in half. To learn how to do this, please see Sending data from multiple high-availability Prometheus instances.
You can reduce the number of active series sent to Grafana Cloud in two ways:
Allowlisting: This involves keeping a set of “important” metrics and labels that you explicitly define, and dropping everything else. To allowlist metrics and labels, you should identify a set of core important metrics and labels that you’d like to keep. To enable allowlisting in Prometheus, use the
labelkeepactions with any relabeling configuration.
Denylisting: This involves dropping a set of high-cardinality “unimportant” metrics that you explicitly define, and keeping everything else. Denylisting becomes possible once you’ve identified a list of high-cardinality metrics and labels that you’d like to drop. To learn how to discover high-cardinality metrics, please see Analyzing Prometheus metric usage. To enable denylisting in Prometheus, use the
labeldropactions with any relabeling configuration.
Both of these methods are implemented through Prometheus’s metric filtering and relabeling feature,
relabel_config. This feature allows you to filter through series labels using regular expressions and keep or drop those that match. You can also manipulate, transform, and rename series labels using
Prom Labs’s Relabeler tool may be helpful when debugging relabel configs. Relabeler allows you to visually confirm the rules implemented by a relabel config.
You can filter series using Prometheus’s
relabel_config configuration object. At a high level, a
relabel_config allows you to select one or more source label values that can be concatenated using a
separator parameter. The result can then be matched against using a
regex, and an
action operation can be performed if a match occurs.
You can perform the following common
keep: Keep a matched target or series, drop all others
drop: Drop a matched target or series, keep all others
replace: Replace or rename a matched label with a new one defined by the
labelkeep: Match the
regexagainst all label names, drop all labels that don’t match (ignores
source_labelsand applies to all label names)
labeldrop: Match the
regexagainst all label names, drop all labels that match (ignores
source_labelsand applies to all label names)
For a full list of available actions, please see
relabel_config from the Prometheus documentation.
relabel_config must have the same general structure:
- source_labels = [source_label_1, source_label_2, ...] separator: ; action: replace regex: (.*) replacement: $1
These default values should be modified to suit your relabeling use case.
source_labels: Select one or more labels from the available set
separator: Concatenate selected label values using this character
regex: Match this regular expression on concatenated data
action: Execute the specified relabel action
replacement: If using one of
labelmap, define the replacement value. You can use regex match groups to access data captured by the
regex. To learn more about regex match groups, please see this StackOverflow answer.
target_label: Assign the extracted and modified label value defined by
replacementto this label name.
Parameters that aren’t explicitly set will be filled in using default values. For readability it’s usually best to explicitly define a
relabel_config. To learn more about the general format for a
relabel_config block, please see
relabel_config from the Prometheus docs.
Here’s an example:
- source_labels: [ instance_ip ] separator: ; action: replace regex: (.*) replacement: $1 target_label: host_ip
This minimal relabeling snippet searches across the set of scraped labels for the
instance_ip label. If it finds the
instance_ip label, it renames this label to
host_ip. Since the
(.*) regex captures the entire label value, replacement references this capture group,
$1, when setting the new
target_label. Since we’ve used default
separator values here, they can be omitted for brevity. However, it’s usually best to explicitly define these for readability.
To drop a specific label, select it using
source_labels and use a replacement value of
"". To bulk drop or keep labels, use the
You can use a
relabel_config to filter through and relabel:
- Scrape targets
- Samples and labels to ingest into Prometheus storage
- Samples and labels to ship to remote storage
You’ll learn how to do this in the next section.
Relabel_config in a Prometheus configuration file
You can apply a
relabel_config to filter and manipulate labels at the following stages of metric collection:
- Target selection in the
relabel_configssection of a
scrape_configsjob. This allows you to use a
relabel_configobject to select targets to scrape and relabel metadata created by any service discovery mechanism.
- Metric selection in the
metric_relabel_configssection of a
scrape_configsjob. This allows you to use a
relabel_configobject to select labels and series that should be ingested into Prometheus storage.
- Remote Write in the
write_relabel_configssection of a
remote_writeconfiguration. This allows you to use a
relabel_configto control which labels and series Prometheus ships to remote storage.
This sample configuration file skeleton demonstrates where each of these sections lives in a Prometheus config:
global: . . . rule_files: . . . scrape_configs: - job_name: sample_job_1 kubernetes_sd_configs: - . . . relabel_configs: - source_labels: [. . .] . . . - source_labels: [. . .] . . . metric_relabel_configs: - source_labels: [. . .] . . . - source_labels: [. . .] . . . - job_name: sample_job_2 static_configs: - targets: [. . .] metric_relabel_configs: - source_labels: [. . .] . . . . . . remote_write: - url: . . . write_relabel_configs: - source_labels: [. . .] . . . - source_labels: [. . .] . . .
relabel_configs in a given scrape job to select which targets to scrape. This is often useful when fetching sets of targets using a service discovery mechanism like
kubernetes_sd_configs, or Kubernetes service discovery. To learn more about Prometheus service discovery features, please see Configuration from the Prometheus docs.
metric_relabel_configs in a given scrape job to select which series and labels to keep, and to perform any label replacement operations. This occurs after target selection using
write_relabel_configs in a
remote_write configuration to select which series and labels to ship to remote storage. This configuration does not impact any configuration set in
relabel_configs. If you drop a label in a
metric_relabel_configs section, it won’t be ingested by Prometheus and consequently won’t be shipped to remote storage.
Scrape target selection using relabel_configs
relabel_configs configuration allows you to
drop targets returned by a service discovery mechanism like Kubernetes service discovery or AWS EC2 instance service discovery. For example, you may have a scrape job that fetches all Kubernetes Endpoints using a
kubernetes_sd_configs parameter. By using the following
relabel_configs snippet, you can limit scrape targets for this job to those whose Service label corresponds to
app=nginx and port name to
scrape_configs: - job_name: kubernetes_nginx honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: http kubernetes_sd_configs: - role: endpoints namespaces: names: - default relabel_configs: - source_labels: [__meta_kubernetes_service_label_app] regex: nginx action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: web action: keep
The initial set of endpoints fetched by
kuberentes_sd_configs in the
default namespace can be very large depending on the apps you’re running in your cluster. Using the
__meta_kubernetes_service_label_app label filter, endpoints whose corresponding services do not have the
app=nginx label will be dropped by this scrape job.
kubernetes_sd_configs will also add any other Pod ports as scrape targets (with
role: endpoints), we need to filter these out using the
__meta_kubernetes_endpoint_port_name relabel config. For example, if a Pod backing the Nginx service has two ports, we only scrape the port named
web and drop the other.
To summarize, the above snippet fetches all endpoints in the
default Namespace, and keeps as scrape targets those whose corresponding Service has an
app=nginx label set. This set of targets consists of one or more Pods that have one or more defined ports. We drop all ports that aren’t named
Using relabeling at the target selection stage, you can selectively choose which targets and endpoints you want to scrape (or drop) to tune your metric usage.
Metric and label selection using metric_relabel_configs
Relabeling and filtering at this stage modifies or drops samples before Prometheus ingests them locally and ships them to remote storage. This relabeling occurs after target selection. Once Prometheus scrapes a target,
metric_relabel_configs allows you to define
replace actions to perform on scraped samples:
- job_name: monitoring/kubelet/1 honor_labels: true honor_timestamps: false scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics/cadvisor scheme: https kubernetes_sd_configs: - role: endpoints namespaces: names: - kube-system bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token tls_config: insecure_skip_verify: true relabel_configs: - source_labels: [__meta_kubernetes_service_label_k8s_app] regex: kubelet action: keep - source_labels: [__meta_kubernetes_endpoint_port_name] regex: https-metrics action: keep . . . metric_relabel_configs: - source_labels: [__name__] regex: container_(network_tcp_usage_total|network_udp_usage_total|tasks_state|cpu_load_average_10s) action: drop
This sample piece of configuration instructs Prometheus to first fetch a list of endpoints to scrape using Kubernetes service discovery (
kubernetes_sd_configs). Endpoints are limited to the
kube-system namespace. Next, using
relabel_configs, only Endpoints with the Service Label
k8s_app=kubelet are kept. Furthermore, only Endpoints that have
https-metrics as a defined port name are kept. This reduced set of targets corresponds to Kubelet
https-metrics scrape endpoints.
After scraping these endpoints, Prometheus applies the
metric_relabel_configs section, which
drops all metrics whose metric name matches the specified
regex. You can extract a sample’s metric name using the
__name__ meta-label. In this case Prometheus would drop a metric like
container_network_tcp_usage_total(. . .). Prometheus keeps all other metrics. You can add additional
metric_relabel_configs sections that
replace and modify labels here.
metric_relabel_configs are commonly used to relabel and filter samples before ingestion, and limit the amount of data that gets persisted to storage. Using
metric_relabel_configs, you can drastically reduce your Prometheus metrics usage by throwing out unneeded samples.
If shipping samples to Grafana Cloud, you also have the option of persisting samples locally, but preventing shipping to remote storage. To do this, use a
relabel_config object in the
write_relabel_configs subsection of the
remote_write section of your Prometheus config. This can be useful when local Prometheus storage is cheap and plentiful, but the set of metrics shipped to remote storage requires judicious curation to avoid excess costs.
Controlling remote write behavior using write_relabel_configs
Relabeling and filtering at this stage modifies or drops samples before Prometheus ships them to remote storage. Using this feature, you can store metrics locally but prevent them from shipping to Grafana Cloud. To learn more about
remote_write, please see
remote_write from the official Prometheus docs.
Prometheus applies this relabeling and dropping step after performing target selection using
relabel_configs and metric selection and relabeling using
The following snippet of configuration demonstrates an “allowlisting” approach, where the specified metrics are shipped to remote storage, and all others dropped. Recall that these metrics will still get persisted to local storage unless this relabeling configuration takes place in the
metric_relabel_configs section of a scrape job.
remote_write: - url: <Your Metrics instance remote_write endpoint> remote_timeout: 30s write_relabel_configs: - source_labels: [__name__] regex: "apiserver_request_total|kubelet_node_config_error|kubelet_runtime_operations_errors_total" action: keep basic_auth: username: <your_remote_endpoint_username_here> password: <your_remote_endpoint_password_here> queue_config: capacity: 500 max_shards: 1000 min_shards: 1 max_samples_per_send: 100 batch_send_deadline: 5s min_backoff: 30ms max_backoff: 100ms
This piece of
remote_write configuration sets the remote endpoint to which Prometheus will push samples. The
write_relabel_configs section defines a
keep action for all metrics matching the
apiserver_request_total|kubelet_node_config_error|kubelet_runtime_operations_errors_total regex, dropping all others. You can additionally define
remote_write-specific relabeling rules here.
Finally, this configures authentication credentials and the
remote_write queue. To learn more about
remote_write configuration parameters, please see
remote_write from the Prometheus docs.
In this guide, we’ve presented an overview of Prometheus’s powerful and flexible
relabel_config feature and how you can leverage it to control and reduce your local and Grafana Cloud Prometheus usage.
Choosing which metrics and samples to scrape, store, and ship to Grafana Cloud can seem quite daunting at first. Curated sets of important metrics can be found in Mixins. Mixins are a set of preconfigured dashboards and alerts. The PromQL queries that power these dashboards and alerts reference a core set of “important” observability metrics. There are Mixins for Kubernetes, Consul, Jaeger, and much more. To learn more about them, please see Prometheus Monitoring Mixins. Allowlisting or keeping the set of metrics referenced in a Mixin’s alerting rules and dashboards can form a solid foundation from which to build a complete set of observability metrics to scrape and store.