Blog  /  Engineering

How to quickly find unused metrics and get more value from Grafana Cloud

Devin Cheevers

Devin Cheevers 2 Jul 2021 4 min read


“I wish there was a fast way to see the top metrics by cardinality that are never added to a dashboard.”Steph Timms, Senior Systems Engineer at Mailchimp

As the complexity of software systems explodes, so does the amount of data that gets generated by instrumenting these systems. This poses a problem for our users — especially those who are in charge of observability teams and observability platforms at large enterprises. They have to strike the right balance between cost management and giving teams the freedom to instrument whatever they want. Often observability leaders are supporting dozens of teams that are using hundreds of dashboards. It’s not easy to figure out which team cares about which signal in a scalable way. 

As we talked to a bunch of our Grafana Cloud users about this challenge, we realized that if we could answer the question of what metrics are not being used, we’d be able to give users a fast way to start to figure out which metrics matter. 

We’re excited to share a recently released set of commands for cortex-tools — our command line tool for interacting with Grafana Cloud — that generates a list of metrics, ranked by cardinality, that are going unused.

An unused metric in the context of the commands (analyse commands) is currently defined as an metric that is an active series in Grafana Cloud storage but is not shown on any dashboard in a Grafana instance. 

Please note that this is an early iteration and there are some limitations. Critically, dashboards that use template variables are ignored.  We strongly recommend not relying on this tool if you have dashboards that use template variables, as those metrics will be defined as “unused.” We hope to have this limitation fixed in the next few months. It’s also important to note that metrics that are used for Grafana Alerts or queried are assumed to be unused. This command is currently solved based on “is this metric used in a dashboard.”

Okay, let’s try it out.

Getting started

It’s super easy to get started.

  1. First, install cortex-tools, a set of powerful command line tools for interacting with Cortex. 
  2. Create a Grafana API key. 
  3. Run the cortextool analyse grafana command, ./cortextool analyse grafana --address=<grafana-address> --key=<api-key>, to see a list of metrics that are charted in Grafana dashboards.
  4. Run the analyse prometheus command, ./cortextool analyse prometheus --address=https://prometheus-us-central1.grafana.net/api/prom --id=<grafanacoud-instance-id> --key=<grafanacloud-api-key> --log.level=debug, to see a list metrics, ranked by cardinality, that are not being used in Grafana dashboards. 

The metrics that are not shown in your Grafana dashboards are prime candidates for removal. We still recommend checking with teams and stakeholders before removal, but this list should be a good starting point for thinking about your metrics usage. Remember that metrics used for alerts and querying or metrics in a dashboard that uses template variables will be defined as unused. 

For more details on this feature, check out the docs

Removing unused metrics

Now that you have a list of metrics that are going unused, how do you go about removing them from ingestion? Let’s say you see that metric_a and metric_b are not used in any Grafana dashboard and have too high cardinality, and you don’t want them.

If you’re using Prometheus or the Grafana Agent to send metrics to Grafana Cloud, you need to modify your remote_write config to prevent metrics from being sent.

This is done by adding an entry to write_relabel_configs within your existing remote_write config. 

For example:

remote_write:

- url: https://prometheus-us-central1.grafana.net/api/prom/push

  basic_auth:

    username: <tenant ID>

    password: <Grafana Cloud MetricsPublisher API key>

  write_relabel_configs:

- source_labels: \[\_\_name\_\_]

      regex: metric_a|metric_b

      action: drop

This rule looks for any metric whose value for the label __name__ is metric_a or metric_b. The __name__ label represents the name of the metric in Prometheus. The rule uses a regex to match against metric names, so you can add as many metrics as you want, provided you separate them with a | pipe. 

That’s it! Thanks for following along. We’ve figured out top metrics that are unused by cardinality, and we’ve removed a couple of them from being sent.

What’s next

Grafana Cloud makes it easy to get started with metrics, logs, traces, and dashboards. (If you aren’t already using it, check out our free and paid Grafana Cloud plans for every use case, and sign up for a free trial.) Part of our mission is giving you control over your data. We’d love to hear from you as you try out this feature. Look out for more in this area, as we improve the tools to give you more insight into your usage of metrics, logs, and traces.