How to quickly find unused metrics and get more value from Grafana Cloud

• 2021-07-02 • 4 min

“I wish there was a fast way to see the top metrics by cardinality that are never added to a dashboard.” — Steph Timms, Senior Systems Engineer at Mailchimp

As the complexity of software systems explodes, so does the amount of data that gets generated by instrumenting these systems. This poses a problem for our users — especially those who are in charge of observability teams and observability platforms at large enterprises. They have to strike the right balance between cost management and giving teams the freedom to instrument whatever they want. Often observability leaders are supporting dozens of teams that are using hundreds of dashboards. It’s not easy to figure out which team cares about which signal in a scalable way.

As we talked to a bunch of our Grafana Cloud users about this challenge, we realized that if we could answer the question of what metrics are not being used, we’d be able to give users a fast way to start to figure out which metrics matter.

We’re excited to share a recently released set of commands for cortex-tools — our command line tool for interacting with Grafana Cloud — that generates a list of metrics, ranked by cardinality, that are going unused.

An unused metric in the context of the commands (analyse commands) is currently defined as an metric that is an active series in Grafana Cloud storage but is not shown on any dashboard in a Grafana instance.

Okay, let’s try it out.

Getting started

It’s super easy to get started.

First, install cortex-tools, a set of powerful command line tools for interacting with Cortex.
Create a Grafana API key.
Run the cortextool analyse grafana command, ./cortextool analyse grafana --address=<grafana-address> --key=<api-key>, to see a list of metrics that are charted in Grafana dashboards.
Run the analyse prometheus command, ./cortextool analyse prometheus --address=https://prometheus-us-central1.grafana.net/api/prom --id=<grafanacoud-instance-id> --key=<grafanacloud-api-key> --log.level=debug, to see a list metrics, ranked by cardinality, that are not being used in Grafana dashboards. To get the address of your Cloud Prometheus query endpoint, please navigate to Prometheus in the Grafana Cloud Portal.

The metrics that are not shown in your Grafana dashboards are prime candidates for removal. We still recommend checking with teams and stakeholders before removal, but this list should be a good starting point for thinking about your metrics usage. Remember that metrics used for alerts and querying or metrics in a dashboard that uses template variables will be defined as unused.

For more details on this feature, check out the docs.

Removing unused metrics

Now that you have a list of metrics that are going unused, how do you go about removing them from ingestion? Let’s say you see that metric_a and metric_b are not used in any Grafana dashboard and have too high cardinality, and you don’t want them.

If you’re using Prometheus or the Grafana Agent to send metrics to Grafana Cloud, you need to modify your remote_write config to prevent metrics from being sent.

This is done by adding an entry to write_relabel_configs within your existing remote_write config.

For example:

remote_write:
- url: <Your Cloud Prometheus metrics instance remote_write endpoint>
  basic_auth:
    username: <Your Cloud Prometheus instance ID>
    password: <Your Cloud Prometheus API key>
  write_relabel_configs:
  - source_labels: [__name__]
    regex: metric_a|metric_b
    action: drop

You can find the remote_write URL, username, and API key configuration information in the Prometheus card of the Grafana Cloud portal.

This rule looks for any metric whose value for the label __name__ is metric_a or metric_b. The __name__ label represents the name of the metric in Prometheus. The rule uses a regex to match against metric names, so you can add as many metrics as you want, provided you separate them with a | pipe.

That’s it! Thanks for following along. We’ve figured out top metrics that are unused by cardinality, and we’ve removed a couple of them from being sent.

What’s next

Grafana Cloud makes it easy to get started with metrics, logs, traces, and dashboards. (If you aren’t already using it, check out our free and paid Grafana Cloud plans for every use case, and sign up for a free trial.) Part of our mission is giving you control over your data. We’d love to hear from you as you try out this feature. Look out for more in this area, as we improve the tools to give you more insight into your usage of metrics, logs, and traces.

How to quickly find unused metrics and get more value from Grafana Cloud

Getting started

Removing unused metrics

What’s next

Related content

Grafana Cloud updates: The latest features in Kubernetes Monitoring, Fleet Management, and more

Grafana Cloud: Manage the AWS Observability app as code with Terraform

Prometheus data source update: Redefining our big tent philosophy