Menu
Enterprise

Overview

Grafana Enterprise Metrics provides the ability to understand the cardinality of your metrics and labels using Cardinality management dashboards that are shipped with the Grafana Enterprise Metrics plugin or via the Admin API.

The APIs and dashboards help you understand the active time series in GEM. An active time series is one that has not yet been written to long-term storage.

Configuration

The API endpoints are disabled by default. Use one of the following approaches to enable or disable the endpoints for all tenants:

  • Add the CLI flag -querier.cardinality-analysis-enabled=true.
  • Set cardinality_analysis_enabled to true in the limits section of the global configuration file as shown below:
yaml
limits:
  cardinality_analysis_enabled: true

To selectively disable the endpoints for some tenants (if it’s been enabled for all tenants), or enable the endpoints for some tenants (when it is globally disabled), use the Runtime Configuration file.

Limitations

  • The cardinality management dashboards only work for single-tenant data sources. Similarly, the cardinality management APIs will only return cardinality information for a single tenant at a time. You cannot get a global view of the cardinality of multiple tenants simultaneously. This means that any call to the API where you provide multiple tenants in the username field will fail. For example, team-a|team-b will fail, but team-a or team-b will succeed.
  • The cardinality management dashboards do not work for data sources that use label-based access controls. Similarly, calls to the cardinality management APIs that use a token for an access policy with label selectors also fail.
  • The cardinality management APIs and dashboards will only work if you are running GEM using block storage. They are incompatible with chunks storage.

Operational considerations

We do not expect this new and experimental API to negatively affect the performance of ingesters in a GEM cluster. To be sure, monitor the cluster after enabling this feature.

To monitor the performance of the cardinality endpoints, use the exposed GEM API endpoints metrics.

The following example query returns the queries-per-second to the cardinality management endpoints:

sum by (route) (
    rate(cortex_querier_request_duration_seconds_sum{
        route=~"prometheus_api_v1_cardinality_label_values|prometheus_api_v1_cardinality_label_names"
    }[1m])
)

To monitor the performance of the whole cluster after enabling cardinality management, use the self-monitoring dashboards that are included in the GEM plugin.

Dashboards

The GEM plugin provides several useful dashboards that visualize and let you explore the data from this API.

Adding the cardinality management dashboards to Grafana Enterprise

The cardinality management dashboards are automatically installed if you install the Grafana Enterprise Metrics plugin. However, in the event that you do not see the dashboards or someone accidentally deletes them, add them back:

  1. Go to Configuration > Plugins > Grafana Enterprise Metrics > Dashboards
  2. Install the dashboards : Cardinality management - overview, Cardinality management - metrics and Cardinality management - labels.

install-dashboards

Cardinality management - overview dashboard

This dashboard shows the cardinality for the selected data source.

Cardinality management - metrics dashboard

This dashboard helps you understand the cardinality of an individual metric. At the top of the dashboard, you can select which metric you want to explore.

Cardinality management - labels dashboard

This dashboard shows a cardinality report for the selected label. For a given label name, it shows you which label values are attached to the most series. It also shows you the highest cardinality metrics for a given label<>value pair.

Scoping the dashboards to specific label-values

As a team lead or service owner, you can get a scoped-view of the cardinality of the metrics you own, and use that to manage the costs of your team. As a administrator or operator of a Grafana instance, for each of your teams you can get a URL for the overview dashboard that is scoped to that team. You can then share it with the team lead, and ask them to take action.

To facilitate these and other use-cases, the cardinality dashboards include an ad hoc filter to further specify and refine the cardinality results. A single filter expression consists of a label, a value, and an operator such as = (equals), or != (does not equal). The ad hoc filter allows you to combine multiple filters. The Cardinality feature does not support the operators < and >. The cardinality queries yield results where all of the filter expressions evaluate to true, which allows you to define the subset of label values that you want to focus on.

The ad hoc filter appears on the top of the screen, and has the label Filter. The format of the filter expression is LABEL OPERATOR VALUE. When there are multiple filter expressions, they are separated by AND. To add a filter expression, click + on the right-most side of the filter expression list. To remove filter expressions one-by-one, select LABEL and then the “–remove filter–” option.

The ad hoc filter persists as you navigate through the cardinality dashboards, though panel links to metrics or labels. Some links on the dashboards further refine the filter. The tables that appear at the bottom of both metrics and labels dashboards contain links that add an additional filter expression appropriate for the link you have selected for further inspection.

Cardinality management HTTP API

You can use two API endpoints to understand a tenant’s metrics and label cardinality: label_names (/api/v1/cardinality/label_names) and label_values (/api/v1/cardinality/label_values). The cardinality management dashboards display information returned from these endpoints.

Because these endpoints generate their cardinality report using only values from currently opened TSDBs (time series databases) in the ingesters, two subsequent calls can return completely different results if an ingester cut or truncated an old block and opened a new one between calls.

Both API endpoints require authentication. Specifically, the user must provide a token which gives them metrics: read access for that tenant.

Label names cardinality endpoint

GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_names

# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_names

Returns realtime label names cardinality across all ingesters, for the authenticated tenant, in JSON format. It counts distinct label values per label name.

The items in the field cardinality are sorted by label_values_count in descending order and by label_name in ascending order.

The count of items returned is limited by limit request parameter.

Request parameters

  • selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
  • limit - optional - specifies max count of items in field cardinality in response (default=20, min=0, max=500)

Example:

To understand which labels attached to the metric flower_events_created have the most values, use the following command:

console
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2&selector=\{__name__='flower_events_created'\}" | jq
json
{
  "label_values_count_total": 206,
  "label_names_count": 12,
  "cardinality": [
    {
      "label_name": "worker",
      "label_values_count": 162
    },
    {
      "label_name": "task",
      "label_values_count": 29
    }
  ]
}

From this we see that the metric flower_events_created has 12 different label names attached to it. Across those 12 label names, there are 206 total values. The label “worker” has 162 values, and the label “task” has 29 values. Not shown are the other label names, since the sample command set limit=2.

If the flower_events_created selector were omitted, the API call

console
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2" | jq

would return the label names with the highest count of values across the entire tenant.

Response schema

json
{
  "label_values_count_total": <number>,
  "label_names_count": <number>,
  "cardinality": [
    {
      "label_name": <string>,
      "label_values_count": <number>
    }
  ]
}

Label values cardinality endpoint

GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_values

# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_values

Returns realtime label values cardinality associated with request parameter label_names[] across all ingesters, for the authenticated tenant, in JSON format. It returns the series count per label value for each label in the request parameter label_names[].

The items in the field labels are sorted by series_count in descending order and by label_name in ascending order. The items in the field cardinality are sorted by series_count in descending order and by label_value in ascending order.

The count of cardinality items is limited by request parameter limit.

Request parameters

  • label_names[] - required - specifies labels for which cardinality must be provided.
  • selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
  • limit - optional - specifies max count of items in field cardinality in response (default=20, min=0, max=500).

Example 1 (label values cardinality):

In case we want to understand which label values have the highest number of flower_events_created series associated with them, we can execute:

console
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=worker&label_names[]=agent&limit=2&selector=\{__name__='flower_events_created'\}" | jq
json
{
  "series_count_total": 5472781,
  "labels": [
    {
      "label_name": "worker",
      "label_values_count": 162,
      "series_count": 1307,
      "cardinality": [
        {
          "label_value": "aws-worker",
          "series_count": 67
        },
        {
          "label_value": "gcp-worker",
          "series_count": 66
        }
      ]
    },
    {
      "label_name": "agent",
      "label_values_count": 2,
      "series_count": 11,
      "cardinality": [
        {
          "label_value": "grafana-agent",
          "series_count": 10
        },
        {
          "label_value": "jaeger-agent",
          "series_count": 1
        }
      ]
    }
  ]
}

From this, we see that there are 5,472,781 series with the metric name flower_events_created. Of those 5,472,781 series, there are 67 series with worker=aws-worker and 66 series with worker=gcp-worker. From the series_count, there are 1307 series with the label worker (across all 162 values of worker).

Similarly, of the 5,472,781 total series, there are 10 series with agent=grafana-agent and 1 series with agent=jaeger-agent. From the series_count there are 11 total series with the label agent (across all 2 values of agent).

Example 2 (metric names cardinality):

In case we want to understand which metrics have the highest cardinality (i.e. have the most time series) you can look at the cardinality of the __name__ label.

console
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=__name__&limit=2" | jq
json
{
  "series_count_total": 1307,
  "labels": [
    {
      "label_name": "__name__",
      "label_values_count": 162,
      "series_count": 1307,
      "cardinality": [
        {
          "label_value": "flower_events_created",
          "series_count": 67
        },
        {
          "label_value": "flower_events_consumed",
          "series_count": 66
        }
      ]
    }
  ]
}

In this example, there are 1307 total active time series for the tenant named tenant-id. As there are 162 values for the label __name__, we know this means there are 162 metrics for this tenant. The label_value in the cardinality part of the payload are the names of the highest cardinality metrics. In this example, we see that metric flower_events_created has 67 series associated with it and metric flower_events_consumed has 66 series associated with it.

Response schema

json
{
  "series_count_total": <number>,
  "labels": [
    {
      "label_name": <string>,
      "label_values_count": <number>,
      "series_count": <number>,
      "cardinality": [
        {
          "label_value": <string>,
          "series_count": <number>
        }
      ]
    }
  ]
}
  • series_count_total - total number of series across opened TSDBs in all ingesters
  • labels[].label_name - label name requested via the request parameter label_names[]
  • labels[].label_values_count - total number of label values for the label name (note that dependent on the limit request parameter it is possible that not all label values are present in cardinality)
  • labels[].series_count - total number of series having labels[].label_name
  • labels[].cardinality[].label_value - label value associated with labels[].label_name
  • labels[].cardinality[].series_count - total number of series having label_value for label_name