Reduce metrics costs via Adaptive Metrics
Adaptive Metrics is a cardinality optimization feature that allows you to identify and eliminate unused time series metrics data by means of aggregation. Recommended rules identify what metrics to aggregate based on usage within your cloud environment.
Adaptive Metrics consists of the following services:
- The recommendations service generates recommended rules for aggregation.
- The aggregations service implements those rules.
Upon reviewing the recommended rules, you can decide what rules to apply (via the plugin or API). Also, you can create your own rules (via the API).
Supported metrics formats
Grafana Cloud accepts metrics data in a variety of formats, and Adaptive Metrics is compatible with the following subset of formats:
|Prometheus||Yes||Fully supported. However, if you do not send metric metadata, few recommendations will be generated. Metric metadata is sent by default in newer versions of Prometheus and the Grafana Agent, but will not be sent if intentionally disabled or if running an older version where the default is to not send.|
|OpenTelemetry||Yes||Recommendations are limited because metadata is not sent.|
|Influx Line protocol||Yes||Recommendations are limited because metadata is not sent.|
Check if you are sending metadata for your metrics
To check whether you are sending metrics metadata, send a request to the HTTP API
curl -u "$METRICS_INSTANCE_ID:$API_KEY" "https://<cluster>.grafana.net/prometheus/api/v1/metadata"
Adaptive Metrics uses Prometheus metrics metadata stored in your Grafana Hosted Metrics instance to make sure that recommendations are safe to apply mathematically.
For example, for a counter-type metric, recommendations by Adaptive Metrics make sure that counter resets are handled correctly during aggregation.
If metrics metadata is not available for a metric, and Adaptive Metrics is unable to infer a metric’s type from its name or usage patterns, no recommendation will be produced for that metric. If you are using a metrics format other than Prometheus, metrics metadata is not preserved. As a result, there are fewer recommendations for those metrics.
Aggregation service: requirements on sample age
We can only aggregate raw samples that are relatively recent. Grafana Cloud will reject samples for metrics being aggregated that arrive more than 90s delayed. If the difference between the wall clock time at which a sample arrives at Grafana Cloud and the timestamp on that sample (which indicates when it was collected) is greater than 90 seconds, Grafana Cloud will reject that sample.
If Grafana Cloud rejects samples for this reason, you will see an increase in
aggregator-sample-too-old errors on the Discarded Metrics Samples panel of your billing dashboard.
This sample age requirement only applies to samples that belong to metrics that are being aggregated.
Why this happens
To compute an aggregation, you must wait for all raw samples associated with that metric to arrive. We don’t know how many samples will arrive, nor can we wait indefinitely on those samples, because the longer we wait, the longer the delay in the data being queryable and/or visible in dashboards.
If a sample arrives after our configured waiting time, it does not get taken into account during the computation of the aggregated value. Because our metrics database is immutable once the aggregation has been computed, we cannot update the aggregated value to reflect this late arriving data point.
If you encounter issues querying a metric that has been aggregated, see Troubleshoot your aggregated metrics query. For any other questions or feedback, contact your Customer Success Manager or file a support request.