Troubleshoot your aggregated metrics query

Adaptive Metrics recommendations are based on the observed usage of the metrics from dashboards, query logs from the last 30 days, recording rules, and alerting rules. The aggregation rules suggested by the recommendations service will generate aggregations compatible with the previously observed usage, so no changes to that usage are required. This means that in the vast majority of cases, querying aggregated metrics will not be any different from querying raw metrics.

Understand PromQL aggregation operators

PromQL supports multiple aggregation operators that can be applied to the queried series. These operators transform a vector of time series into a vector with fewer time series, depending on the aggregation operator and the set of labels used in the query. For example:

metric{label="value"} is an unaggregated query that returns a vector of all time series matching the metric name and label set.
sum by (other_label) (metric{label="value"}) is an aggregated query that returns one time series for each one of the values of other_label, with the sum of all matching series for that label value.

Refer to these PromQL aggregation operators in the context of applying an aggregation.

Adaptive Metrics aggregations

Adaptive Metrics supports sum, count, min, and max aggregations, as well as a special aggregation type sum:counter that is used to persist the counter metric increases (accounting for the resets). The absolute values of sum:counter don’t have any meaning without a rate(), irate(), or increase() range-vector function applied to them.

Inspect available labels

It’s a common pattern to query just the metric name to see what labels are available for that metric. This is an invalid usage of aggregated metrics, because their values are meaningless without the proper aggregation applied. For this reason, querying just the name of an aggregated metric will return a “Can’t query aggregated metric without aggregation” error.

To retrieve just the label values, while ignoring the values, you must use group aggregation. In particular, group without () is a valid usage that groups no values, as in group without () (metric).

Alternatively and more conveniently, you can use Grafana’s query builder to inspect the available labels for a metric.

Troubleshoot missing data

You may find yourself in a situation where a query that worked with raw metrics returns no data when applied to aggregated metrics. This can happen for a number of reasons, and this section will help you troubleshoot the issue.

Matching an aggregated label

When a query matches a value of a label that was previously aggregated, the result set is empty. This is the same as querying a non-existent label in raw Prometheus metrics.

Use Grafana’s query builder to inspect the available labels for a metric.

Missing aggregation

Adaptive Metrics adds yet another dimension to the metrics: the aggregation. When querying aggregated metrics, the storage engine will select the aggregation required to fulfill the query. If the aggregation requested is not persisted, the query will return no data.

For example, if the query is sum(metric), the sum aggregation had to be previously configured for query to produce results. Similarly, if the query is count(metric), then count aggregation is required.

If a metric is configured to be aggregated as sum and count, then both sum(metric) and count(metric) queries will return data. In this case the avg(metric) query will also return data, as it can be computed from the sum and count aggregations.

However, if a metric is configured to be aggregated as sum, the sum(rate(metric[1m])) query will not return any data, as the aggregation required to compute the rate(), irate(), and increase() functions is sum:counter.

Similarly, if sum:counter aggregation is configured, the sum(metric) query will not return any data for a metric aggregated as sum.

Mix of aggregated and raw data

When the result set of a query contains both aggregated and raw data, the query engine will produce no data for the time range that contains both types of data. This is done on purpose, as during the transition period from raw to aggregated data (or vice versa) the results would be inconsistent.

For example for an instant query sum(metric), when queried exactly at the time of the transition from raw to aggregated data, the result will include the aggregated samples but the PromQL lookback will also include the raw samples from the previous 5 minutes, and the result will be twice as much as the expected one.

Similarly, for range query sum(rate(metric[1h])) overlapping the transition period, it is hard to correctly correlate the counter resets between the raw and aggregated data. Additionally, the PromQL rate extrapolation will also modify the result as it would not be aware of the transition period.

Troubleshoot `execution: Can't query aggregated metric...`

When exploring aggregated metrics, you may encounter the execution: Can't query aggregated metric... error. This section will help you understand the reasons for this error and how to fix it.

Query aggregated metrics without an aggregation

When series of a metric are aggregated, they lose their meaning if viewed without the appropriate aggregation applied.

For example, a counter aggregated as sum:counter stores only the required data to respond to the sum(rate(...[])), sum(increase(...[])), and sum(irate(...[])) queries, and the absolute values persisted in the storage have little relation to the absolute values written by the application. This is because the counter resets are accounted for in the aggregation, and also extra counter resets are introduced to the aggregated data by the aggregation service.

To prevent misinterpretation of the data, the query engine will return the execution: Can't query aggregated metric... error when trying to query aggregated metrics without an appropriate aggregation instead of returning meaningless values.

Query series without matching the metric name

Sometimes it is impossible for the query engine to determine the required aggregation for a query, or the names of the affected metrics. For example, the query count({job="something"}) can match both aggregated and non-aggregated metrics.

When the query engine can’t determine the required aggregation, it will assume that the persisted data is not aggregated, and will validate that assumption while processing results. The validation is performed to avoid providing wrong data (like an incorrect aggregation) to the aggregation operator, as the results would stop making sense.

If the assertion of having only raw data is violated, query engine will return the execution: Can't query aggregated metric... error.

To fix this error, the query should select only raw series, or should include the metric name selector for the aggregated metrics.

(In rare cases for debugging purposes) Inspect aggregated data

Only in rare cases will you need to inspect aggregated data, which can be useful for debugging purposes.

How can I count the aggregated series?

If you want to know how many series your metric produces after aggregation, query count(group without() (metric)).

How can I inspect the aggregated data?

If you want to inspect the underlying aggregated data, include a matcher for the __aggregation__!="none" label in each vector selector of your query: metric{label="value", __aggregation__!="none"}.

If you include the __aggregation__!="none" label matcher, you are disabling query mapping, which would have otherwise been done for you.

How can I inspect the stored aggregations?

You can query group by (__aggregation__) to see the aggregations stored for a given metric. In general group by (...) queries are valid regardless of the aggregation stored (because the group aggregation drops the values).

How can I disable query-aggregations mapping for all queries issued from a data source or an external app?

Any query request including X-Query-Aggregations: false will disable the query mapping for that request. You can configure that header in your data source settings.

Limitations of query-aggregations

In order to properly query pre-aggregated data, every query is mapped to one that queries both raw an aggregated data (because we don’t know beforehand what’s in the storage). In order to properly perform the query mapping, we need to know the metric name being queried. This becomes a limitation for queries that don’t query a specific metric name, like count({job="something"}) which can’t be evaluated when they match aggregated data.