This is documentation for the next version of Metrics enterprise. For the latest stable release, go to the latest version.

Manage a tenantCardinality analysis

Overview

Grafana Enterprise Metrics provides the ability to understand the cardinality of your metrics and labels using Cardinality analysis dashboards that are shipped with the Grafana Enterprise Metrics plugin or via an API.

The APIs and dashboards help you understand the active time series in GEM. An active time series is one that has not yet been written to long-term storage.

Configuration

The API endpoints are disabled by default. Use one of the following approaches to enable or disable the endpoints for all tenants:

  • Add the CLI flag -querier.cardinality-analysis-enabled=true.
  • Set cardinality_analysis_enabled to true in the limits_config section of the global configuration file as shown below:
  limits:
    cardinality_analysis_enabled: true

To selectively disable the endpoints for some tenants (if it’s been enabled for all tenants), or enable the endpoints for some tenants (when it is globally disabled), use the Runtime Configuration file.

Limitations

  • The cardinality analysis dashboards only work for single-tenant data sources. Similarly, the cardinality analysis APIs will only return cardinality information for a single tenant at a time. You cannot get a global view of the cardinality of multiple tenants simultaneously. This means that any call to the API where you provide multiple tenants in the username field will fail. For example, team-a|team-b will fail, but team-a or team-b will succeed.
  • The cardinality analysis dashboards do not work for data sources that use label-based access controls. Similarly, calls to the cardinality analysis APIs that use a token for an access policy with label selectors also fail.
  • The cardinality analysis APIs and dashboards will only work if you are running GEM using block storage. They are incompatible with chunks storage.

Operational considerations

We do not expect this new and experimental API to negatively affect the performance of ingesters in a GEM cluster. To be sure, monitor the cluster after enabling this feature.

To monitor the performance of the cardinality endpoints, use the exposed GEM API endpoints metrics.

The following example query returns the queries-per-second to the cardinality analysis endpoints:

sum by (route) (
    rate(cortex_querier_request_duration_seconds_sum{
        route=~"prometheus_api_v1_cardinality_label_values|prometheus_api_v1_cardinality_label_names"
    }[1m])
)

To monitor the performance of the whole cluster after enabling cardinality analysis, use the self-monitoring dashboards that are included in the GEM plugin.

Dashboards

The GEM plugin provides several useful dashboards that visualize and let you explore the data from this API.

Adding the cardinality analysis dashboards to Grafana Enterprise

The cardinality analysis dashboards are automatically installed if you install the Grafana Enterprise Metrics plugin. However, in the event that you do not see the dashboards or someone accidentally deletes them, add them back:

  1. Go to Configuration > Plugins > Grafana Enterprise Metrics > Dashboards
  2. Install the dashboards : Cardinality management - overview, Cardinality management - metrics and Cardinality management - labels.

install-dashboards

Cardinality management - overview dashboard

This dashboard shows the cardinality for the selected data source.

cardinality-analysis-dashboard-example

Cardinality management - metrics dashboard

This dashboard helps you understand the cardinality of an individual metric. At the top of the dashboard, you can select which metric you want to explore.

cardinality-analysis-metrics-drill-down-dashboard-example

Cardinality management - labels dashboard

This dashboard shows a cardinality report for the selected label. For a given label name, it shows you which label values are attached to the most series. It also shows you the highest cardinality metrics for a given label<>value pair.

cardinality-analysis-labels-drill-down-dashboard-example

HTTP API

You can use two API endpoints to understand a tenant’s metrics and label cardinality: label_names (/api/v1/cardinality/label_names) and label_values (/api/v1/cardinality/label_values). The cardinality analysis dashboards display information returned from these endpoints.

Because these endpoints generate their cardinality report using only values from currently opened TSDBs (time series databases) in the ingesters, two subsequent calls can return completely different results if an ingester cut or truncated an old block and opened a new one between calls.

Both API endpoints require authentication. Specifically, the user must provide a token which gives them metrics: read access for that tenant.

Label names cardinality endpoint

GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_names

# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_names

Returns realtime label names cardinality across all ingesters, for the authenticated tenant, in JSON format. It counts distinct label values per label name.

The items in the field cardinality are sorted by label_values_count in descending order and by label_name in ascending order.

The count of items returned is limited by limit request parameter.

Request parameters

  • selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
  • limit - optional - specifies max count of items in field cardinality in response (default=20, min=0, max=500)

Example:

To understand which labels attached to the metric flower_events_created have the most values, use the following command:

$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2&selector=\{__name__='flower_events_created'\}" | jq
{
   "label_values_count_total": 206,
   "label_names_count": 12,
   "cardinality": [
      {
         "label_name": "worker",
         "label_values_count": 162
      },
      {
         "label_name": "task",
         "label_values_count": 29
      }
   ]
}

From this we see that the metric flower_events_created has 12 different label names attached to it. Across those 12 label names, there are 206 total values. The label “worker” has 162 values, and the label “task” has 29 values. Not shown are the other label names, since the sample command set limit=2.

If the flower_events_created selector were omitted, the API call

$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2" | jq

would return the label names with the highest count of values across the entire tenant.

Response schema

{
  "label_values_count_total": <number>,
  "label_names_count": <number>,
  "cardinality": [
    {
      "label_name": <string>,
      "label_values_count": <number>
    }
  ]
}

Label values cardinality endpoint

GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_values

# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_values

Returns realtime label values cardinality associated with request parameter label_names[] across all ingesters, for the authenticated tenant, in JSON format. It returns the series count per label value for each label in the request parameter label_names[].

The items in the field labels are sorted by series_count in descending order and by label_name in ascending order. The items in the field cardinality are sorted by series_count in descending order and by label_value in ascending order.

The count of cardinality items is limited by request parameter limit.

Request parameters

  • label_names[] - required - specifies labels for which cardinality must be provided.
  • selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
  • limit - optional - specifies max count of items in field cardinality in response (default=20, min=0, max=500).

Example 1 (label values cardinality):

In case we want to understand which label values have the highest number of flower_events_created series associated with them, we can execute:

$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=worker&label_names[]=agent&limit=2&selector=\{__name__='flower_events_created'\}" | jq
{
   "series_count_total": 5472781,
   "labels": [
      {
         "label_name": "worker",
         "label_values_count": 162,
         "series_count": 1307,
         "cardinality": [
            {
               "label_value": "aws-worker",
               "series_count": 67
            },
            {
               "label_value": "gcp-worker",
               "series_count": 66
            }
         ]
      },
      {
         "label_name": "agent",
         "label_values_count": 2,
         "series_count": 11,
         "cardinality": [
            {
               "label_value": "grafana-agent",
               "series_count": 10
            },
            {
               "label_value": "jaeger-agent",
               "series_count": 1
            }
         ]
      }
   ]
}

From this, we see that there are 5,472,781 series with the metric name flower_events_created. Of those 5,472,781 series, there are 67 series with worker=aws-worker and 66 series with worker=gcp-worker. From the series_count, there are 1307 series with the label worker (across all 162 values of worker).

Similarly, of the 5,472,781 total series, there are 10 series with agent=grafana-agent and 1 series with agent=jaeger-agent. From the series_count there are 11 total series with the label agent (across all 2 values of agent).

Example 2 (metric names cardinality):

In case we want to understand which metrics have the highest cardinality (i.e. have the most time series) you can look at the cardinality of the __name__ label.

$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=__name__&limit=2" | jq
{
   "series_count_total": 1307,
   "labels": [
      {
         "label_name": "__name__",
         "label_values_count": 162,
         "series_count": 1307,
         "cardinality": [
            {
               "label_value": "flower_events_created",
               "series_count": 67
            },
            {
               "label_value": "flower_events_consumed",
               "series_count": 66
            }
         ]
      }
   ]
}

In this example, there are 1307 total active time series for the tenant named tenant-id. As there are 162 values for the label __name__, we know this means there are 162 metrics for this tenant. The label_value in the cardinality part of the payload are the names of the highest cardinality metrics. In this example, we see that metric flower_events_created has 67 series associated with it and metric flower_events_consumed has 66 series associated with it.

Response schema

{
  "series_count_total": <number>,
  "labels": [
    {
      "label_name": <string>,
      "label_values_count": <number>,
      "series_count": <number>,
      "cardinality": [
        {
          "label_value": <string>,
          "series_count": <number>
        }
      ]
    }
  ]
}
  • series_count_total - total number of series across opened TSDBs in all ingesters
  • labels[].label_name - label name requested via the request parameter label_names[]
  • labels[].label_values_count - total number of label values for the label name (note that dependent on the limit request parameter it is possible that not all label values are present in cardinality)
  • labels[].series_count - total number of series having labels[].label_name
  • labels[].cardinality[].label_value - label value associated with labels[].label_name
  • labels[].cardinality[].series_count - total number of series having label_value for label_name