Remote-write rule forwarding

Grafana Metrics Enterprise (GME) allows for forwarding metrics evaluated from the Ruler to any Prometheus remote-write compatible backend.

This works by loading rule groups into the Ruler with an extra config field as shown in the example below:

# A regular Cortex rule group
groups:
  - name: group_one
    interval: 5m
    rules:
      - expr: 'rate(prometheus_remote_storage_samples_in_total[5m])'
        record: 'prometheus_remote_storage_samples_in_total:rate5m'
  - name: group_two
    interval: 1m
    rules:
      - expr: 'rate(prometheus_remote_storage_samples_in_total[1m])'
        record: 'prometheus_remote_storage_samples_in_total:rate1m'
    remote_write:
      - url: 'http://user:pass@example.com/api/v1/push'

In the above example, when group_2 is loaded into Grafana Metrics Enterprise, the Ruler Module will evaluate the expression rate(prometheus_remote_storage_samples_in_total[1m]) every 1m and forward the generated metric with name prometheus_remote_storage_samples_in_total:rate1m to example.com. Meanwhile, group_1 will continue to work as expected, the evaluated metric prometheus_remote_storage_samples_in_total:rate5m will be stored within the same cortex instance that is running the Ruler.

Configuration

Rule Storage

Remote write rules are compatible with the following backends:

  • Azure Blob Storage
  • GCS
  • S3
  • Swift

The following backends are not supported:

  • local filesystem
  • ConfigDB

Write-ahead log (WAL)

When a rule group is configured with a remote-write config, GME buffers the generated metrics in a write-ahead log (WAL) before forwarding them to the remote-write endpoint. This is done to increase reliability in case either the GME instance or the remote endpoint crashes. If the GME instance crashes, it reads from the WAL and continues to forward metrics to the configured backend from the last sent timestamp. If the remote endpoint crashes, GME continues to retry requests until it is available again. If multiple rule groups have been configured to send to the same remote-write endpoint, the GME instance will use a common WAL for the metrics generated by those rule groups.

By default, the WAL is stored in the wal folder in the GME binary working directory.

$ ls
metrics-enterprise-binary   wal/

The directory can be configured as shown:

enterprise_features:
  ruler_remote_write:
    wal_dir: /tmp/wal

Example

The following is a complete example of the above mentioned config options using a ruler with sharding enabled and S3 as its rule storage backend:

ruler:
  external_url: localhost:9090
  rule_path: "/tmp/rules"
  storage:
    type: s3
    s3:
      endpoint: minio:9000
      access_key_id: cortex
      secret_access_key: supersecret
      bucketnames: "gme-ruler"
      insecure: true
      s3forcepathstyle: true
  poll_interval: 10s
  enable_api: true
  enable_sharding: true
  ring:
    kvstore:
      store: memberlist

enterprise_features:
  ruler_remote_write:
    enabled: true
    wal_dir: /tmp/wal

Loading remote-write groups

The cortextool project, as of version v0.3.1, is compatible with Prometheus rule files that contain the remote-write rule group syntax. You can download and use the latest version of the cortextool here.

You can also use the docker image of the cortextool: docker pull grafana/cortex-tools:latest

Example usage

Once you have GME running with remote-write rule groups enabled you can load remote-write rule groups using the following procedure.

  1. Save the following file to your workspace:

rules.yaml:

groups:
  - name: remote_write_group
    interval: 5m
    rules:
      - expr: 'sum(up)'
        record: 'sum_up'
    remote_write:
      - url: 'http://user:pass@example.com/api/v1/push'
  1. Run the following command with cortextool:
$ cortextool rules sync \
--rule-files=rules.yaml \
--id=<instance-name> \
--address=<gme-url> \
--key=<valid-gme-write-token>