Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Operator and user guide

Architecture

Components

(Optional) Ruler

Open source

(Optional) Grafana Mimir ruler

The ruler is an optional component that evaluates PromQL expressions defined in recording and alerting rules. Each tenant has a set of recording and alerting rules and can group those rules into namespaces.

Operational modes

The ruler supports two different rule evaluation modes:

Internal

This is the default mode. The ruler internally runs a querier and distributor, and evaluates recording and alerting rules in the ruler process itself. To evaluate rules, the ruler connects directly to ingesters and store-gateways, and writes any resulting series to the ingesters.

Configuration of the built-in querier and distributor uses their respective configuration parameters:

Note: When this mode is used, no query acceleration techniques are used and the evaluation of very high cardinality queries could take longer than the evaluation interval, eventually leading to missing data points in the evaluated recording rules.

Remote

In this mode the ruler delegates rules evaluation to the query-frontend. When enabled, the ruler leverages all the query acceleration techniques employed by the query-frontend, such as query sharding. To enable the remote operational mode, set the -ruler.query-frontend.address CLI flag or its respective YAML configuration parameter for the ruler. Communication between ruler and query-frontend is established over gRPC, so you can make use of client-side load balancing by prefixing the query-frontend address URL with dns://.

Recording rules

The ruler evaluates the expressions in the recording rules at regular intervals and writes the results back to the ingesters.

Alerting rules

The ruler evaluates the expressions in alerting rules at regular intervals and if the result includes any series, the alert becomes active. If an alerting rule has a defined for duration, it enters the PENDING (pending) state. After the alert has been active for the entire for duration, it enters the FIRING (firing) state. The ruler then notifies Alertmanagers of any FIRING (firing) alerts.

Configure the addresses of Alertmanagers with the -ruler.alertmanager-url flag. This flag supports the DNS service discovery format. For more information about DNS service discovery, refer to Supported discovery modes.

If you’re using Mimir’s Alertmanager, point the address to Alertmanager’s API. You can configure Alertmanager’s API prefix via the -http.alertmanager-http-prefix flag, which defaults to /alertmanager. For example, if Alertmanager is listening at http://mimir-alertmanager.namespace.svc.cluster.local and it is using the default API prefix, set -ruler.alertmanager-url to http://mimir-alertmanager.namespace.svc.cluster.local/alertmanager.

Federated rule groups

A federated rule group is a rule group with a non-empty source_tenants.

The source_tenants field allows aggregating data from multiple tenants while evaluating a rule group. The expressions of each rule in the group will be evaluated against the data of all tenants in source_tenants. If source_tenants is empty or omitted, then the tenant under which the group is created will be treated as the source_tenant.

Below is an example of how a federated rule group would look like:

name: MyGroupName
source_tenants: ["tenant-a", "tenant-b"]
rules:
  - record: sum:metric
    expr: sum(metric)

In this example MyGroupName rules will be evaluated against tenant-a and tenant-b tenants.

Federated rule groups are skipped during evaluation by default. This feature depends on the cross-tenant query federation feature. To enable federated rules set -ruler.tenant-federation.enabled=true and -tenant-federation.enabled=true CLI flags (or their respective YAML config options).

During evaluation query limits applied to single tenants are also applied to each query in the rule group. For example, if tenant-a has a federated rule group with source_tenants: [tenant-b, tenant-c], then query limits for tenant-b and tenant-c will be applied. If any of these limits is exceeded, the whole evaluation will fail. No partial results will be saved. The same “no partial results” guarantee applies to queries failing for other reasons (e.g. ingester unavailability).

The time series used during evaluation of federated rules will have the __tenant_id__ label, similar to how it is present on series returned with cross-tenant query federation.

Note: Federated rule groups allow data from multiple source tenants to be written into a single destination tenant. This makes the existing separation of tenants’ data less clear. For example, tenant-a has a federated rule group that aggregates over tenant-b’s data (e.g. sum(metric_b)) and writes the result back into tenant-a’s storage (e.g. as metric sum:metric_b). Now part of tenant-b’s data is copied to tenant-a (albeit aggregated). Have this in mind when configuring the access control layer in front of mimir and when enabling federated rules via -ruler.tenant-federation.enabled.

Sharding

The ruler supports multi-tenancy and horizontal scalability. To achieve horizontal scalability, the ruler shards the execution of rules by rule groups. Ruler replicas form their own hash ring stored in the KV store to divide the work of the executing rules.

To configure the rulers’ hash ring, refer to configuring hash rings.

Manage alerting and recording rules

There is more than one way to manage alerting and recording rules.

Via the `mimirtool` CLI tool

The mimirtool rules command offers utility subcommands for linting, formatting, and uploading rules to Grafana Mimir. For more information, refer to the mimirtool rules.

Via the `grafana/mimir/operations/mimir-rules-action` GitHub Action

The GitHub Action mimir-rules-action wraps some of the functionality of mimirtool rules. For more information, refer to the documentation of the action.

Via the HTTP configuration API

The ruler HTTP configuration API enables tenants to create, update, and delete rule groups. For a complete list of endpoints and example requests, refer to ruler.

State

The ruler uses the backend configured via -ruler-storage.backend. The ruler supports the following backends:

Amazon S3: -ruler-storage.backend=s3
Google Cloud Storage: -ruler-storage.backend=gcs
Microsoft Azure Storage: -ruler-storage.backend=azure
OpenStack Swift: -ruler-storage.backend=swift
Local storage: -ruler-storage.backend=local

Local storage

The local storage backend reads Prometheus recording rules from the local filesystem.

Note: Local storage is a read-only backend that does not support the creation and deletion of rules through the Configuration API.

When all rulers have the same rule files, local storage supports ruler sharding. To facilitate sharding in Kubernetes, mount a Kubernetes ConfigMap into every ruler pod.

The following example shows a local storage definition:

-ruler-storage.backend=local
-ruler-storage.local.directory=/tmp/rules

The ruler looks for tenant rules in the /tmp/rules/<TENANT ID> directory. The ruler requires rule files to be in the Prometheus format.

Was this page helpful?

Feedback

(Optional) Grafana Mimir ruler