Overview
Grafana Enterprise Metrics provides the ability to understand the cardinality of your metrics and labels using Cardinality management dashboards that are shipped with the Grafana Enterprise Metrics plugin or via the Admin API.
The APIs and dashboards help you understand the active time series in GEM. An active time series is one that has not yet been written to long-term storage.
Configuration
The API endpoints are disabled by default. Use one of the following approaches to enable or disable the endpoints for all tenants:
- Add the CLI flag
-querier.cardinality-analysis-enabled=true
. - Set
cardinality_analysis_enabled
totrue
in thelimits
section of the global configuration file as shown below:
limits:
cardinality_analysis_enabled: true
To selectively disable the endpoints for some tenants (if it’s been enabled for all tenants), or enable the endpoints for some tenants (when it is globally disabled), use the Runtime Configuration file.
Limitations
- The cardinality management dashboards only work for single-tenant data sources. Similarly, the cardinality management APIs will only return cardinality information for a single tenant at a time. You cannot get a global view of the cardinality of multiple tenants simultaneously. This means that any call to the API where you provide multiple tenants in the
username
field will fail. For example,team-a|team-b
will fail, butteam-a
orteam-b
will succeed. - The cardinality management dashboards do not work for data sources that use label-based access controls. Similarly, calls to the cardinality management APIs that use a token for an access policy with label selectors also fail.
- The cardinality management APIs and dashboards will only work if you are running GEM using block storage. They are incompatible with chunks storage.
Operational considerations
We do not expect this new and experimental API to negatively affect the performance of ingesters in a GEM cluster. To be sure, monitor the cluster after enabling this feature.
To monitor the performance of the cardinality endpoints, use the exposed GEM API endpoints metrics.
The following example query returns the queries-per-second to the cardinality management endpoints:
sum by (route) (
rate(cortex_querier_request_duration_seconds_sum{
route=~"prometheus_api_v1_cardinality_label_values|prometheus_api_v1_cardinality_label_names"
}[1m])
)
To monitor the performance of the whole cluster after enabling cardinality management, use the self-monitoring dashboards that are included in the GEM plugin.
Dashboards
The GEM plugin provides several useful dashboards that visualize and let you explore the data from this API.
Adding the cardinality management dashboards to Grafana Enterprise
The cardinality management dashboards are automatically installed if you install the Grafana Enterprise Metrics plugin. However, in the event that you do not see the dashboards or someone accidentally deletes them, add them back:
- Go to
Configuration > Plugins > Grafana Enterprise Metrics > Dashboards
- Install the dashboards :
Cardinality management - overview
,Cardinality management - metrics
andCardinality management - labels
.
Cardinality management - overview dashboard
This dashboard shows the cardinality for the selected data source.
Cardinality management - metrics dashboard
This dashboard helps you understand the cardinality of an individual metric. At the top of the dashboard, you can select which metric you want to explore.
Cardinality management - labels dashboard
This dashboard shows a cardinality report for the selected label. For a given label name, it shows you which label values are attached to the most series. It also shows you the highest cardinality metrics for a given label<>value pair.
Scoping the dashboards to specific label-values
As a team lead or service owner, you can get a scoped-view of the cardinality of the metrics you own, and use that to manage the costs of your team. As a administrator or operator of a Grafana instance, for each of your teams you can get a URL for the overview dashboard that is scoped to that team. You can then share it with the team lead, and ask them to take action.
To facilitate these and other use-cases, the cardinality dashboards include an ad hoc filter to further specify and refine the cardinality results.
A single filter expression consists of a label, a value, and an operator such as =
(equals), or !=
(does not equal). The ad hoc filter allows you to combine multiple filters.
The Cardinality feature does not support the operators <
and >
.
The cardinality queries yield results where all of the filter expressions evaluate to true
, which allows you to define the subset of label values that you want to focus on.
The ad hoc filter appears on the top of the screen, and has the label Filter.
The format of the filter expression is LABEL
OPERATOR
VALUE
. When there are multiple filter expressions, they are separated by AND
.
To add a filter expression, click + on the right-most side of the filter expression list.
To remove filter expressions one-by-one, select LABEL
and then the “–remove filter–” option.
The ad hoc filter persists as you navigate through the cardinality dashboards, though panel links to metrics or labels. Some links on the dashboards further refine the filter. The tables that appear at the bottom of both metrics and labels dashboards contain links that add an additional filter expression appropriate for the link you have selected for further inspection.
Cardinality management HTTP API
You can use two API endpoints to understand a tenant’s metrics and label cardinality: label_names
(/api/v1/cardinality/label_names
) and label_values
(/api/v1/cardinality/label_values
). The cardinality management dashboards display information returned from these endpoints.
Because these endpoints generate their cardinality report using only values from currently opened TSDBs (time series databases) in the ingesters, two subsequent calls can return completely different results if an ingester cut or truncated an old block and opened a new one between calls.
Both API endpoints require authentication. Specifically, the user must provide a token which gives them metrics: read
access for that tenant.
Label names cardinality endpoint
GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_names
# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_names
Returns realtime label names cardinality across all ingesters, for the authenticated tenant, in JSON
format.
It counts distinct label values per label name.
The items in the field cardinality
are sorted by label_values_count
in descending order and by label_name
in ascending order.
The count of items returned is limited by limit
request parameter.
Request parameters
- selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
- limit - optional - specifies max count of items in field
cardinality
in response (default=20, min=0, max=500)
Example:
To understand which labels attached to the metric flower_events_created
have the most values, use the following command:
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2&selector=\{__name__='flower_events_created'\}" | jq
{
"label_values_count_total": 206,
"label_names_count": 12,
"cardinality": [
{
"label_name": "worker",
"label_values_count": 162
},
{
"label_name": "task",
"label_values_count": 29
}
]
}
From this we see that the metric flower_events_created
has 12 different label names attached to it. Across those 12 label names, there are 206 total values. The label “worker” has 162 values, and the label “task” has 29 values. Not shown are the other label names, since the sample command set limit=2
.
If the flower_events_created
selector were omitted, the API call
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_names?limit=2" | jq
would return the label names with the highest count of values across the entire tenant.
Response schema
{
"label_values_count_total": <number>,
"label_names_count": <number>,
"cardinality": [
{
"label_name": <string>,
"label_values_count": <number>
}
]
}
Label values cardinality endpoint
GET,POST <prometheus-http-prefix>/api/v1/cardinality/label_values
# Legacy
GET,POST <legacy-http-prefix>/api/v1/cardinality/label_values
Returns realtime label values cardinality associated with request parameter label_names[]
across all ingesters, for the authenticated tenant, in JSON
format.
It returns the series count per label value for each label in the request parameter label_names[]
.
The items in the field labels
are sorted by series_count
in descending order and by label_name
in ascending order.
The items in the field cardinality
are sorted by series_count
in descending order and by label_value
in ascending order.
The count of cardinality
items is limited by request parameter limit
.
Request parameters
- label_names[] - required - specifies labels for which cardinality must be provided.
- selector - optional - specifies PromQL selector that will be used to filter series that must be analyzed.
- limit - optional - specifies max count of items in field
cardinality
in response (default=20, min=0, max=500).
Example 1 (label values cardinality):
In case we want to understand which label values have the highest number of flower_events_created
series associated with them, we can execute:
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=worker&label_names[]=agent&limit=2&selector=\{__name__='flower_events_created'\}" | jq
{
"series_count_total": 5472781,
"labels": [
{
"label_name": "worker",
"label_values_count": 162,
"series_count": 1307,
"cardinality": [
{
"label_value": "aws-worker",
"series_count": 67
},
{
"label_value": "gcp-worker",
"series_count": 66
}
]
},
{
"label_name": "agent",
"label_values_count": 2,
"series_count": 11,
"cardinality": [
{
"label_value": "grafana-agent",
"series_count": 10
},
{
"label_value": "jaeger-agent",
"series_count": 1
}
]
}
]
}
From this, we see that there are 5,472,781 series with the metric name flower_events_created.
Of those 5,472,781 series, there are 67 series with worker=aws-worker
and 66 series with worker=gcp-worker
. From the series_count
, there are 1307 series with the label worker
(across all 162 values of worker
).
Similarly, of the 5,472,781 total series, there are 10 series with agent=grafana-agent
and 1 series with agent=jaeger-agent
. From the series_count
there are 11 total series with the label agent
(across all 2 values of agent
).
Example 2 (metric names cardinality):
In case we want to understand which metrics have the highest cardinality (i.e. have the most time series) you can look at
the cardinality of the __name__
label.
$ curl -u "<tenant-id>:$API_TOKEN" "<host and port>/prometheus/api/v1/cardinality/label_values?label_names[]=__name__&limit=2" | jq
{
"series_count_total": 1307,
"labels": [
{
"label_name": "__name__",
"label_values_count": 162,
"series_count": 1307,
"cardinality": [
{
"label_value": "flower_events_created",
"series_count": 67
},
{
"label_value": "flower_events_consumed",
"series_count": 66
}
]
}
]
}
In this example, there are 1307 total active time series for the tenant named tenant-id
. As there are 162 values for the label __name__
, we know this means there are 162 metrics for this tenant. The label_value
in the cardinality
part of the payload are the names of the highest cardinality metrics. In this example, we see that metric flower_events_created
has 67 series associated with it and metric flower_events_consumed
has 66 series associated with it.
Response schema
{
"series_count_total": <number>,
"labels": [
{
"label_name": <string>,
"label_values_count": <number>,
"series_count": <number>,
"cardinality": [
{
"label_value": <string>,
"series_count": <number>
}
]
}
]
}
- series_count_total - total number of series across opened TSDBs in all ingesters
- labels[].label_name - label name requested via the request parameter
label_names[]
- labels[].label_values_count - total number of label values for the label name (note that dependent on the
limit
request parameter it is possible that not all label values are present incardinality
) - labels[].series_count - total number of series having
labels[].label_name
- labels[].cardinality[].label_value - label value associated with
labels[].label_name
- labels[].cardinality[].series_count - total number of series having
label_value
forlabel_name