---
title: "Define metrics aggregation rules | Grafana Cloud documentation"
description: "Define metrics aggregation rules."
---

# Define metrics aggregation rules

The *aggregations* service provides a way for you to aggregate metrics into lower cardinality versions of themselves. Users can define and apply their own aggregation rules, or apply the rules recommended by the recommendations service.

## Aggregation rule format

The aggregations service expects the following format:

Expand table

| Field name             | Data type                  | Description                                                                                                                                                                                                                                                                                                  |
|------------------------|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `metric`               | string                     | The metric name or metric name matcher to which the aggregation rule applies.                                                                                                                                                                                                                                |
| `match_type`           | string (optional)          | The type of matching to be done against the value of the `metric` field. For valid values, see [substring matchers](#substring-matchers). If you do not specify `match_type`, the value is `exact`.                                                                                                          |
| `drop`                 | bool (optional)            | If set to `true`, the entire metric is dropped instead of aggregated. If you set this to `true`, you cannot use the `drop_labels` and `aggregations` fields. If you do not specify `drop`, the value is `false`.                                                                                             |
| `drop_labels`          | string array               | The list of labels that are aggregated away; each of these labels that is present in the original series has its value set to `<aggregated>`. You can specify either `drop_labels` or `keep_labels`, but you can’t use both fields within the same rule.                                                     |
| `keep_labels`          | string array               | The list of labels that are retained. The value of all labels not present in this list is replaced by `<aggregated>`. You can specify either `keep_labels` or `drop_labels`, but you can’t use both fields within the same rule.                                                                             |
| `aggregations`         | string array               | The list of aggregation functions to apply to the metric or metrics that are matched by this rule. For valid values, see [Supported aggregation types](#supported-aggregation-types).                                                                                                                        |
| `aggregation_interval` | string duration (optional) | The interval of samples that are included in a single emitted aggregated sample. See [Configure the aggregation interval](#configure-the-aggregation-interval-and-the-dpm-of-the-aggregated-metric) for valid values. If you set `aggregation_interval`, you also need to specify `aggregation_delay` field. |
| `aggregation_delay`    | string duration (optional) | The time of samples that are included in a single emitted aggregated sample. See [Configure the aggregation interval](#configure-the-aggregation-interval-and-the-dpm-of-the-aggregated-metric) for valid values. If you set `aggregation_delay`, you also need to specify `aggregation_interval` field.     |

The following example shows an aggregation rule for the metric `proxy_sql_queries_total`:

JSON ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```json
{
  "metric": "proxy_sql_queries_total",
  "drop_labels": ["container", "instance", "namespace", "pod"],
  "aggregations": ["sum:counter"]
}
```

> Note
> 
> Adaptive Metrics now supports Mimir [native histograms](/docs/mimir/latest/send/native-histograms/).
> 
> The supported aggregation types are `sum:counter` and `count`.

## Supported aggregation types

The following values are supported for the `aggregations` field of an aggregation rule:

Expand table

| Aggregation function | Definition                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|----------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `sum:counter`        | The running sum of all increases of raw series values. Applicable to counter type metrics, and correctly accounts for counter resets. A counter type metric is conceptually similar to elevation gain. For example, if a cyclist counts their elevation gain by peak, they can sum several peaks’ worth of elevation gain to understand how much they’ve climbed in total. The elevation gain for each peak over time is a raw series. If you specify the `sum:counter` aggregation with `"drop_labels": ["peak"]` for this metric, the per-peak raw series would be aggregated into one series that would tell the cyclist the total amount they climbed over time. From this aggregated data, they can no longer tell how much they have climbed in total for a given peak. |
| `sum`                | The sum of all values across the aggregated series at a given time stamp. The `sum` aggregation is not useful for counter type metrics; for counter type metrics, use `sum:counter` instead.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `min`                | The minimum of all values across all the aggregated series at a given time stamp.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `max`                | The maximum of all values across all the aggregated series at a given time stamp.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| `count`              | The number of raw series that feed into the aggregated series at a given time stamp.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |

## Substring matchers

By default, a rule is applied to the metric name specified in the rule’s `metric` field. In addition, Adaptive Metrics allows you to write rules that apply to all metrics whose names match a given prefix or suffix. To apply rules to all such metrics, use the optional field `match_type` in your rule and set it to `prefix` or `suffix`.

The `match_type` field supports the following values:

- `exact`: Apply the rule to the metric whose name is specified in the rule’s `metric` field. Because metric names are unique, the rule only applies to one metric.
- `prefix`: Apply the rule to all metrics whose names start with the string in the rule’s `metric` field.
- `suffix`: Apply the rule to all metrics whose names end with the string in the rule’s `metric` field.

An example rule that matches all metrics beginning with `http_requests_total_`, and that aggregates away their `instance` label using the `sum:counter` function, looks as follows:

JSON ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```json
{
  "metric": "http_requests_total_",
  "match_type": "prefix",
  "drop_labels": ["instance"],
  "aggregations": ["sum:counter"]
}
```

In such scenario, the metric `http_requests_total_abc` has two rules that potentially apply. However, because an exact match has precedence over a prefix match, both the `instance` and `pod` labels would be aggregated away for `http_requests_total_abc`:

JSON ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```json
[
  {
    "metric": "http_requests_total_",
    "match_type": "prefix",
    "drop_labels": ["instance"],
    "aggregations": ["sum:counter"]
  },
  {
    "metric": "http_requests_total_abc",
    "drop_labels": ["instance", "pod"],
    "aggregations": ["sum:counter"]
  }
]
```

If multiple substring matchers match a metric, the first match always wins. Consider a rule file with the following two rules:

JSON ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```json
[
  {
    "metric": "http_requests_total_",
    "match_type": "prefix",
    "drop_labels": ["instance"],
    "aggregations": ["sum:counter"]
  },
  {
    "metric": "_abc",
    "match_type": "suffix",
    "drop_labels": ["pod"],
    "aggregations": ["sum:counter"]
  }
]
```

In this scenario, the metric `http_requests_total_abc` is matched by both rules. Because neither rule is an exact match, the first rule in the list takes precedence. This means that the `instance` label, not the `pod` label is aggregated away for `http_requests_total_abc`.

## Configure an aggregation

As an illustration, think of a power grid that monitors the energy consumption of houses on different city streets. An example metric that expresses building consumption could be `electrical_throughput_total` with labels `street_name` and `building_number`. Given that you only care about the total energy consumption per street and the average consumption per building on a street, you could configure two aggregations where one sums the consumption of all buildings in a street and the other counts the buildings of the street.

Since the metric `electrical_throughput_total` is a counter, we’d need to use the `sum:counter` aggregation (instead of the `sum` aggregation) to handle counter resets correctly:

JSON ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```json
{
  "metric": "electrical_throughput_total",
  "drop_labels": ["building_number"],
  "aggregations": ["sum:counter", "count"]
}
```

Based on the preceding configuration, the aggregation service would discard the label `building_number` from the aggregated metric `electrical_throughput_total`. In its place, it would compute and store aggregated values per street for this metric.

The `sum:counter` aggregation function computes the total electrical throughput of every street in the `street_name` label set. The `count` aggregation function computes the count of buildings per street. These two values can be used to compute an average consumption per building for each street.

However, because the `building_number` label has been discarded, it is no longer possible to understand how much power a specific building consumes.

Examples of `sum()`, `sum by()`, `count()`, and `count by()` functions are as follows:

- Sum the rate of electrical throughput per street:
  
  promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
  
  ```promql
  sum by (street_name) (rate(electrical_throughput_total[5m]))
  ```
- Sum the rate of electrical throughput for buildings on `<EXAMPLE-STREET>`:
  
  promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
  
  ```promql
  sum(rate(electrical_throughput_total{street_name="<EXAMPLE-STREET>"}[5m]))
  ```
- Count the number of buildings per street that are producing electrical throughput
  
  promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
  
  ```promql
  count by (street_name) (electrical_throughput_total)
  ```
- Count the total number of buildings that are producing electrical throughput
  
  promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
  
  ```promql
  count(electrical_throughput_total)
  ```
- Get the average rate of electrical throughput
  
  promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy
  
  ```promql
  avg(rate(electrical_throughput_total[5m]))
  ```

### Limits on the aggregation service

The Adaptive Metrics feature has limits, which are necessary to guarantee a highly reliable service. These limits are designed to adjust automatically to your usage of the service. This means that as long as your usage of Adaptive Metrics increases gradually, you should not expect to hit limits under normal circumstances. However, if your usage increases substantially over a short period of time, you might experience rate limiting. In this case, limits adapt to the changed usage pattern automatically after some time (usually within 24 hours). If you are experiencing sustained rate limiting beyond this time frame, contact Grafana Labs Support.

#### Number of aggregated series

The Adaptive Metrics aggregation service enforces limits on the number of series that can be aggregated. If these limits are exceeded, the aggregation service begins to discard incoming samples.

When this happens, you see an increase in `aggregator-too-many-aggregated-series` or `aggregator-too-many-raw-series` errors in the **Discarded Metrics Samples** panel of your billing dashboard.

#### Rate of samples to aggregate

There is also a limit on the rate at which samples can get forwarded to the Adaptive Metrics aggregation service. If this limit is exceeded, the API returns a `429` status code and you see an increase in `aggregations-max-ingestion-rate-exceeded` errors in the **Discarded Metrics Samples** panel of your billing dashboard.

## Drop a metric

You can also configure an aggregation rule that causes the entire metric to be dropped. If you don’t want to persist any time series at all for `electrical_throughput_total`, from the example in [Configure an aggregation](#configure-an-aggregation), you would configure a rule as follows:

JSON ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```json
{
  "metric": "electrical_throughput_total",
  "drop": true
}
```

This might be useful in cases where a metric originates in many different locations and it would be hard to configure every site of origin to drop the metric on the client side.

> Note
> 
> Generally, aggregation is more favorable than dropping a metric entirely. By aggregating a metric, you can usually reduce its cardinality by 80-90%, and in the database keep some reference to it, such as a lower-fidelity version of it. This can be useful during the investigation of an incident. If you drop a metric, you reduce costs a bit more, but you eliminate all traces of the metric. This means that you do not see this metric when looking in the metric-name browser in **Grafana Explore**.

If you drop a metric, it shows up on the **Discarded Metrics Samples** panel with a label that provides context about why it was dropped.

Most of these labels are self-explanatory, but in the case of the `requested-by-configuration` label, it means that the user intentionally drops samples by means of aggregation rules that the aggregation service applies.

## Configure the aggregation interval and the DPM of the aggregated metric

The number of data points per minute (DPM) that are stored for the aggregated metric depends directly on the aggregation interval of the metric, which is the interval at which the aggregated samples are emitted.

The default `aggregation_interval` value matches the [included DPM per series of your organization](../../../understand-your-invoice/metrics-invoice/#included-dpm-per-series). For the organizations with the default resolution of 1 DPM this means a default interval setting of `60s`.

The valid values for `aggregation_interval` are: `6s`, `10s`, `15s`, `20s`, `30s` and `60s` corresponding to 10 DPM, 6 DPM, 4 DPM, 3 DPM, 2 DPM and 1 DPM respectively.

> Note
> 
> Changing the values of `aggregation_interval` setting causes a small gap in the data for the affected aggregated metrics while the aggregation is being initialized with the new parameters.

If you want to increase the DPM of the aggregated metric, decrease the `aggregation_interval` to one of the supported values.

> Note
> 
> You can set the `aggregation_interval` individually for each aggregation rule.
> 
> You can also ask Grafana Cloud support to set a global value for `aggregation_interval` as the default for all aggregation rules. Open a [support ticket in the Cloud Portal](/profile/org#support) to request this.

> Caution
> 
> By increasing the DPM of the aggregated metric you may incur [additional costs](../../../understand-your-invoice/metrics-invoice/#billing-calculations).

## Configure the aggregation delay

The `aggregation_delay` is the delay after which the aggregated samples are emitted. The default value is `90s`. The valid values for the `aggregation_delay` are: `15s`, `30s`, `60s`, `1m30s`, `2m`, `2m30s` and `3m`.

> Note
> 
> Changing the values of `aggregation_delay` setting causes a small gap in the data for the affected aggregated metrics while the aggregation is being initialized with the new parameters.

Increase the `aggregation_delay` to emit the aggregated samples later and reduce the risk of excluding the samples that are received late (because of a lagging remote write client, for example).

The total delay between the time of the raw sample arriving at Grafana Cloud and the time that the aggregated sample becomes queryable is usually the sum of the `aggregation_interval` and the `aggregation_delay`. It’s possible that there are transient fluctuations in the real delay with which aggregations are produced. The `aggregation_delay` is a minimum that guarantees that aggregates never get emitted sooner than the configured duration.

> Note
> 
> You can set the `aggregation_delay` individually for each aggregation rule.
> 
> You can also ask Grafana Cloud support to set a global value for `aggregation_delay` as the default for all aggregation rules. Open a [support ticket in the Cloud Portal](/profile/org#support) to request this.
