This is archived documentation for v1.4.1. Go to the latest version.
Grafana Enterprise Metrics (GEM) allows for forwarding metrics evaluated from the Ruler to any Prometheus remote-write compatible backend.
This works by loading rule groups into the Ruler with an extra config field as shown in the example below:
# A regular Cortex rule group groups: - name: group_one interval: 5m rules: - expr: 'rate(prometheus_remote_storage_samples_in_total[5m])' record: 'prometheus_remote_storage_samples_in_total:rate5m' - name: group_two interval: 1m rules: - expr: 'rate(prometheus_remote_storage_samples_in_total[1m])' record: 'prometheus_remote_storage_samples_in_total:rate1m' remote_write: - url: 'http://user:email@example.com/api/v1/push'
In the above example, when
group_2 is loaded into Grafana Enterprise Metrics, the Ruler Module
will evaluate the expression
and forward the generated metric with name
group_1 will continue to work as expected, the evaluated
prometheus_remote_storage_samples_in_total:rate5m will be stored within the same cortex
instance that is running the Ruler.
Remote write rules are compatible with the following backends:
- Azure Blob Storage
The following backends are not supported:
- local filesystem
Write-ahead log (WAL)
When a rule group is configured with a remote-write config, GEM buffers the generated metrics in a write-ahead log (WAL) before forwarding them to the remote-write endpoint. This is done to increase reliability in case either the GEM instance or the remote endpoint crashes. If the GEM instance crashes, it reads from the WAL and continues to forward metrics to the configured backend from the last sent timestamp. If the remote endpoint crashes, GEM continues to retry requests until it is available again. If multiple rule groups have been configured to send to the same remote-write endpoint, the GEM instance will use a common WAL for the metrics generated by those rule groups.
By default, the WAL is stored in the
wal folder in the GEM binary working directory.
$ ls metrics-enterprise-binary wal/
The directory can be configured as shown:
ruler: remote_write: enabled: true wal_dir: /tmp/wal
The following is a complete example of the above mentioned config options using a ruler with sharding enabled and S3 as its rule storage backend:
ruler: external_url: localhost:9090 rule_path: "/tmp/rules" storage: type: s3 s3: endpoint: minio:9000 access_key_id: cortex secret_access_key: supersecret bucketnames: "gem-ruler" insecure: true s3forcepathstyle: true poll_interval: 10s enable_api: true enable_sharding: true ring: kvstore: store: memberlist remote_write: enabled: true wal_dir: /tmp/wal
Loading remote-write groups
cortextool project, as of version
v0.3.1, is compatible with Prometheus rule files that contain the remote-write rule group syntax. You can download and use the latest version of the
You can also use the docker image of the
docker pull grafana/cortex-tools:latest
Once you have GEM running with remote-write rule groups enabled you can load remote-write rule groups using the following procedure.
- Save the following file to your workspace:
groups: - name: remote_write_group interval: 5m rules: - expr: 'sum(up)' record: 'sum_up' remote_write: - url: 'http://user:firstname.lastname@example.org/api/v1/push'
- Run the following command with
$ cortextool rules sync \ --rule-files=rules.yaml \ --id=<instance-name> \ --address=<gem-url> \ --key=<valid-gem-write-token>
Related Metrics Enterprise resources
Running Prometheus-as-a-service with Grafana Enterprise Metrics
Introducing Grafana Enterprise Metrics (GME), a simple and scalable Prometheus service that is seamless to use, simple to maintain, and supported by Grafana Labs.
How Robinhood scaled from 100M to 700M time series with Grafana Enterprise Metrics
In this GrafanaCONline session, the Robinhood team tells how GME (GameStop) led to GEM (Grafana Enterprise Metrics).
Benchmarking Grafana Enterprise Metrics for horizontally scaling Prometheus up to 500 million active series
We stress-tested GEM to show how it horizontally scaled. One takeaway: Hardware usage scales linearly up to 500 million active series.