Menu
Grafana Cloud Account management Billing and usage Control metrics costs via Adaptive Metrics
Grafana Cloud

Control metrics costs via Adaptive Metrics

Adaptive Metrics consists of a recommendations service that generates recommended rules for aggregation, and an aggregations service that implements those rules. You can interact with both of these services via an HTTP API, a CLI tool, or both.

You can also use the Adaptive Metrics application plugin, which is available from the Apps menu.

Note: The API is in an early stage of development and subject to change.
Caution: The CLI is deprecated and will be removed in the near future.

Supported metrics formats

While Grafana Cloud accepts metrics data in a variety of formats, Adaptive Metrics is only compatible with a subset of these formats:

Metrics formatSupported?Notes
PrometheusYesFully supported. However, if you do not send metric metadata, few recommendations will be generated. Metric metadata is sent by default in newer versions of Prometheus and the Grafana Agent, but will not be sent if intentionally disabled or if running an older version where the default is to not send.
OpenTelemetryYesRecommendations are limited because metadata is not sent.
Influx Line protocolYesRecommendations are limited because metadata is not sent.
DatadogNo
GraphiteNo

Check if you are sending metadata for your metrics

To check whether you are sending metrics metadata, send a request to the HTTP API metadata endpoint:

console
curl -u "$METRICS_INSTANCE_ID:$API_KEY" "https://<cluster>.grafana.net/prometheus/api/v1/metadata"
Note: Adaptive Metrics uses Prometheus metrics metadata stored in your Grafana Hosted Metrics instance to ensure recommendations are safe to apply mathematically. For example, for a counter-type metric, recommendations by Adaptive Metrics ensure that counter resets are considered during aggregation. If metrics metadata is not available for a metric, and Adaptive Metrics is unable to infer a metric’s type from its name or usage patterns, no recommendation will be produced for that metric. If you are using a metrics format other than Prometheus, metrics metadata is not preserved. As a result, there are fewer recommendations for those metrics.

CLI workflow

Understand the high-level workflow with the CLI:

  1. Download recommendations of what metrics to aggregate.
  2. Use those recommendations to create your own set of aggregation rules.
  3. Upload that set of aggregation rules.

The CLI also enables you to view, edit, and delete existing aggregation rules that have already been applied.

Use the Adaptive Metrics CLI

Adaptive Metrics provides a CLI tool.

Before you begin

To use the CLI tool, gather the following information:

  • URL: In the form https://<your-grafana-cloud-prom-url>.grafana.net/. To find your URL value, go to your grafana.com account and check the Details page of your hosted Prometheus endpoint.
  • TENANT: The numeric instance ID where Adaptive Metrics is set up. To find your TENANT value, go to your grafana.com account and check the Details page of your hosted Prometheus endpoint for Username / Instance ID.
  • TOKEN: A token from a Grafana Cloud Access Policy, make sure the access policy has metrics:read and metrics:write scopes for the stack ID where you have enabled Adaptive Metrics.
  1. Download the Adaptive Metrics CLI:

    Go to the URL that is based on the build that corresponds to your platform:

  2. Depending on your operating system, you may have to run chmod +x ./adaptive-cli.<your-distribution> to change the file permissions on the CLI and make it executable.

  3. Launch the CLI using the following command:

    ./adaptive-cli.<your-distribution> --user $TENANT --url $URL --password $TOKEN

    In the previous command, substitute the values of $TENANT, $URL, and $TOKEN. For more information, see Before you begin.

  4. Use the show recommendations command to pull down the most recently generated recommendations from the recommendations service.

For built-in help documentation about the CLI tool, launch the tool in interactive mode (adding the --repl flag) and then type --help.

Example aggregation rule

Each aggregation rule looks similar to this:

  {
    "metric": "agent_request_duration_seconds_sum",
    "drop_labels": [
      "container",
      "instance",
      "method",
      "namespace",
      "pod",
      "provider",
      "status_code",
      "ws"
    ],
     "aggregations": [
    	"sum:counter"
    ]
  }

In the preceding example:

  • metric is the name of the metric to be aggregated.
  • drop_labels is an array of the labels that will be removed by the aggregations service.
  • aggregations is an array of the aggregation types to calculate for this metric.

You can use an aggregation rule file to define multiple rules simultaneously.

The following example rule file is an array of one or more aggregation rules:

json
[
  {
    "metric": "agent_request_duration_seconds_sum",
    "drop_labels": ["namespace", "pod"],
    "aggregations": ["sum:counter"]
  },

  {
    "metric": "prometheus_request_duration_seconds_sum",
    "drop_labels": ["container", "instance", "ws"],
    "aggregations": ["sum:counter"]
  }
]

Apply aggregation rules

After you add (create aggregations), modify (edit aggregations), or delete (delete aggregations) an aggregation rule, the CLI’s show aggregations command reflects the change. Use this command to get the most current picture of which aggregation rules are active in your environment.

There is a delay between uploading new aggregation rules and those metrics aggregations taking effect in your environment. In most cases, the delay is approximately 5-10 minutes, but we currently have no mechanism to let you know precisely when new aggregations take effect.

You can query whatever metric you have added, or changed the aggregation rule for, and look at the value of the __dropped_labels__ label. After this value reflects the changes you’ve made, you’ll know your updated aggregation rules are live in your environment.

We currently limit how often new aggregation rules can be applied. Although you can upload as many new versions of your aggregation rules as you like, those updates are only applied once every 10 minutes. If you make multiple updates in quick succession, the system applies your first received (oldest) update. Then, 10 minutes later, the most recently received update is applied. The intermediate updates never get applied.

Adaptive Metrics API

The Adaptive Metrics CLI is a wrapper around an API. You can use the underlying API directly if you choose. This API is under active development and is subject to change.

List recommendations

Download our recommendations for metrics to aggregate using command below. TOKEN and TENANT are variables defined within the requirements section

bash
curl -u "$TENANT:$TOKEN" "$URL/aggregations/recommendations"

TOKEN must belong to an access policy with the metrics:read scope.

You can use an optional verbose flag to retrieve more information about each recommendation:

bash
curl -u "$TENANT:$TOKEN" "$URL/aggregations/recommendations?verbose=true"

List current recommendations configuration

Download the current configuration of the recommendations service using the following command:

bash
curl -u "$TENANT:$TOKEN" "$URL/aggregations/recommendations/config"

TOKEN must belong to an access policy with the metrics:read scope.

The only tunable parameter exposed by the recommendations service is the keep_labels parameter. This parameter allows the user to define a comma-separated list of labels that they never want recommended for aggregation. This can be useful at organizations where certain labels are always expected on metrics, regardless of whether or not those labels have been recently queried.

An example response from the /recommendations/config endpoint would look as follows:

json
{
  "keep_labels": ["instance", "pod"]
}

The preceding response indicates that the recommendations service has been configured to never recommend aggregating the instance or pod labels.

Update recommendations configuration

Upload new recommendations configuration using the following command:

bash
curl -u "$TENANT:$TOKEN" --request POST --data @config.json "$URL/aggregations/recommendations/config"

TOKEN must belong to an access policy with the metrics:write scope.

This command uses the same endpoint described in List current recommendations configuration and expects the same JSON format.

List currently applied aggregation rules

Download your existing aggregation rules:

bash
curl -u "$TENANT:$TOKEN" "$URL/aggregations/rules"

TOKEN must belong to an access policy with the metrics:read scope.

Upload new aggregation rules

Uploading new aggregation rules is a multi-step process:

  1. Fetch the currently applied rules.
  2. Modify rules locally.
  3. Upload rules back.

Fetch the currently applied rules

Use this command:

bash
curl -u "$TENANT:$TOKEN" -D headers.txt "$URL/aggregations/rules" > rules.json

TOKEN must belong to an access policy with the metrics:read scope.

The preceding command uses the same endpoint described in List recommendations, but adds an additional -D headers.txt argument.

The -D headers.txt argument stores the headers in a file called headers.txt. This step is required if you want to then upload a new rule file, for example if you want to update the existing aggregation rules you have in place. The information in these headers ensures there are no update collisions. An update collision is the scenario where multiple users try to edit the rules file at the same time and overwrite one another’s changes.

Modify the rules locally

Use your editor of choice to modify the rules.json file downloaded in the prior step.

Upload rules back

The API supports uploading an entire rules file.

Warning: THIS ACTION WILL OVERWRITE YOUR EXISTING RULE FILE. If you prefer to append to your existing rules, you must use the CLI instead.

To upload your modified rules.json file from the previous step, use the following shell script:

bash
TMPFILE=$(mktemp)
trap 'rm "$TMPFILE"' EXIT

cat headers.txt | grep -i '^etag:' | sed 's/^ETag:/If-Match:/i' > "$TMPFILE"

curl --request POST --header @"$TMPFILE" --data-binary @$1 -u "$TENANT:$TOKEN" "$URL/aggregations/rules"

TOKEN must belong to an access policy with the metrics:write scope.

The cat headers.txt command modifies the headers.txt file created in the previous curl call that pulled down the existing aggregation rules.

The curl --request POST command uploads your new rules file, as well as the updated headers.

Save the shell script as rules_upload.sh.

To run that script, use the following command:

bash
./rules_upload.sh <your_new_rules_file.json>

Replace <your_new_rules_file.json> with the name of the rules file you wish to upload.

Note: If, upon trying to POST the new rules file, you see the error the Etag supplied in the 'If-Match' header does not match the Etag of the rules you are trying to replace, the headers you provided are either missing or stale. To fix, re-fetch the rules file and headers, being careful to look for any changes that may have been introduced since your last edits. For more information on Etag headers, see Etag.
Note: After you configure aggregation rules, the active series count might increase temporarily. Aggregated and unaggregated series will be considered active at the same time. After a short period of time, the unaggregated series will no longer be considered active, and you will see a net reduction in active series.

Aggregation service: requirements on sample age

We can only aggregate raw samples that are relatively recent. Grafana Cloud will reject samples for metrics being aggregated that arrive more than 90s delayed. If the difference between the wall clock time at which a sample arrives at Grafana Cloud and the timestamp on that sample (which indicates when it was collected) is greater than 90 seconds, Grafana Cloud will reject that sample.

If Grafana Cloud rejects samples for this reason, you will see an increase in sample-too-old-for-aggregation or aggregator-sample-too-old errors on the Discarded Metrics Samples panel of your billing dashboard.

This sample age requirement only applies to samples that belong to metrics that are being aggregated.

Why this happens

To compute an aggregation, you must wait for all raw samples associated with that metric to arrive. We don’t know how many samples will arrive, nor can we wait indefinitely on those samples, because the longer we wait, the longer the delay in the data being queryable and/or visible in dashboards.

If a sample arrives after our configured waiting time, it does not get taken into account during the computation of the aggregated value. Because our metrics database is immutable once the aggregation has been computed, we cannot update the aggregated value to reflect this late arriving data point.

Troubleshooting

If you encounter issues querying a metric that has been aggregated, see Troubleshoot your aggregated metrics query. For any other questions or feedback, contact your Customer Success Manager or file a support request.

Security warning when running the CLI on macOS

If you try to run the CLI on macOS and get a security warning that it can’t be opened because Apple cannot check it, perform the following steps:

  1. Open System Settings.
  2. Navigate to Privacy & Security.
  3. Scroll down to Security.
  4. Locate the option to run the CLI.