Operating GEM on Grafana Labs

Compactor

Mon, 01 Jan 0001 00:00:00 +0000

Monitoring compactor health

Grafana Enterprise Metrics emits several metrics related to compactor health. The following queries are useful to get a high-level view of compactor activity. For users with self-monitoring enabled, please see the GEM system monitoring / compactor dashboard, which includes panels built from these queries.

Successful compactor jobs run per hour

sum(increase(cortex_compactor_runs_completed_total[1h]))

This value should be relatively stable when viewed over a long enough period of time, for example hours or days.

Failed compactor jobs run per hour

sum(increase(cortex_compactor_runs_failed_total[1h]))

Note: Restarting the compactor process will interrupt in process compaction jobs. This will increase the value of cortex_compactor_runs_failed_total, but it is not cause for concern as long as these restarts are expected. In the event of a compactor crash, this metric will not be incremented. Compactor process crash events should be monitored separately.

Number of blocks per tenant

sum by (user) (cortex_bucket_blocks_count - cortex_bucket_blocks_marked_for_deletion_count)

This value should be relatively stable over a long enough period of time, for example several days. If the compactor is lagging behind, it will increase over time.

Monitoring bucket index health

Before enabling the bucket index, the index health can be verified by monitoring the cortex_bucket_index_last_successful_update_timestamp_seconds metric. This metric tracks the last successful bucket index update per tenant. The following query can be used to determine the index age for each tenant:

time() - cortex_bucket_index_last_successful_update_timestamp_seconds

The maximum index age should generally line up with the value of the -compactor.cleanup-interval flag.

Note: Some jitter is added to the cleanup interval to prevent all compactor replicas from running at the same moment every time the interval elapses. Additionally, the cleanup takes some time to perform. Because of this, you may see the index age slightly older than the cleanup interval. This is not cause for concern. We recommend configuring an alerting threshold when the index age exceeds (2 * compactor.cleanup-interval) + 5 minutes.

Gateway

Mon, 01 Jan 0001 00:00:00 +0000

Gateway

The Grafana Enterprise Metrics gateway is a service target. It can proxy requests to other Grafana Enterprise Metrics microservices. You can also use it for client-side load balancing of requests proxied to the distributors.

Configuration

The gateway has its own configuration block in the Grafana Enterprise Metrics configuration files.

gateway:
proxy:
default: <backend_proxy_config>
[ admin_api: <backend_proxy_config> ]
[ alertmanager: <backend_proxy_config> ]
[ compactor: <backend_proxy_config> ]
[ distributor: <backend_proxy_config> ]
[ graphite: <backend_proxy_config> ]
[ ingester: <backend_proxy_config> ]
[ query_frontend: <backend_proxy_config> ]
[ ruler: <backend_proxy_config> ]
[ store_gateway: <backend_proxy_config> ]
}

You can also use flags to configure the gateway. Each flag is the path to the equivalent configuration field joined by the period (.) character and with underscores (_) replaced with dashes (-). For example, use the flag --gateway.proxy.store-gateway.url=<store-gateway url> to configure the store-gateway backend proxy URL.

<backend_proxy_config>

A backend_proxy section specifies the URL of the backend to be proxied.

url: <url> | default = <gateway.proxy.default.url>

Client-side load balancing

If you use a backend proxy URL beginning with dns:///, it creates a gRPC proxy with client-side round-robin load balancing instead of the default HTTP reverse proxy. To configure client-side load balancing for requests to the distributors, set the gateway.proxy.distributor.url to dns:///<distributor service>.

Note: There are three / characters in the preceding DNS URL meaning that you are using the default DNS authority. For details about DNS URLs, refer to RFC 4501.

Client-side load balancing is useful in ensuring that distributors are evenly loaded with requests. Prometheus remote-write clients use HTTP persistent connections, also known as HTTP keep-alive, to re-use a single TCP connection for multiple requests and responses resulting in reduced latency for subsequent requests.

Kubernetes Services are not load balancers; initial TCP connections are made using a random endpoint but once the connection is established, the same remote-write client will talk to the same distributor server for its lifetime. This can mean an uneven load for your distributors and worse cluster performance overall.

The Grafana Enterprise Metrics gateway solves this problem by exposing an HTTP server for receiving the client requests but using gRPC to talk to the distributors. The gRPC proxy maintains a list of endpoints returned from the DNS lookup and keeps persistent connections to each one. The proxies are also configured to perform per request client-side load balancing across the endpoints resulting in the best of persistent connections without the issues presented in the preceding paragraph.

OAuth integration

Mon, 01 Jan 0001 00:00:00 +0000

OAuth integration

Grafana Enterprise Metrics supports the OpenID Connect (OIDC) core standard to validate tokens. This allows you to integrate GEM with an existing OAuth token provider at your organization.

To support OIDC, provide the URL of the OIDC provider (issuer) in the auth.admin.oidc.issuer-url setting. The provider is required to have the OIDC Discovery endpoint (also known as “well known endpoint”) at <issuer-url>/.well-known/openid-configuration, as described in the openid standard.

A JWT is included as the password in HTTP basic authentication or as part of a bearer token in bearer authentication. The bearer token should have two parts separated by a :. The first part is the tenant ID. The second part is the JWT.

The JWT is validated against the OIDC provider specified above. If it is valid then an access policy name is extracted. The regular expression in auth.admin.oidc.access_policy_regex is run against each value in the the JWT claim field specified in auth.admin.oidc.access_policy_claim, which can either be a single string or a list of strings.

A sub-match has to be present to extract the access policy. If the value in the JWT claim field is a string, then only the first sub-match is used. If it is a list of strings, then the first submatch for each entry is used. You can use the regular expression (.*) for the whole claim field.

The regular expression syntax is RE2.

Configuration

To use OIDC, specify the auth.type as enterprise. Here is an example authentication section:

auth:
type: enterprise
admin:
oidc:
issuer_url: https://accounts.authprovider.com/realms/example
access_policy_claim: "sub"
access_policy_regex: "pref-([0-9]+)-.*"

Here is an example payload section of a valid JWT:

{
"sub": "pref-1234567890-abc",
"name": "John Doe",
"admin": true
}

The extracted access policy is 1234567890.

Note: OpenID Connect (OIDC) converts the encoded access policies to lowercase (downcase). For example, if your OpenID system has an access policy called Team1, then you need to create an access policy in GEM called team1 so the integration works.

Multiple access policies

It is possible to provide an array of strings in the JWT claim field. If this array only includes one item, then the behavior is the same as when providing a string in this field. In the case where multiple access policies are provided as a list in the JWT claim field, they will be aggregated into a “virtual” access policy. This “virtual” access policy will provide metric read access to the union of all tenants contained in the original access policies. For example, given the following JWT and config above:

{
"sub": ["pref-1234567890-abc", "pref-9876543210-xyz"],
"name": "John Doe",
"admin": true
}

The resulting access policy would be an aggregate of 1234567890 and 9876543210. If 1234567890 provided read and write access to tenant-1, and 9876543210 provided read and write access to tenant-2 and tenant-3, the resulting virtual access policy would provide read-only access to tenant-1, tenant-2, and tenant-3. This generated access policy is cached for the period specified in auth.cache.ttl.duration, which defaults to 10m.

Remote-write rule forwarding

Mon, 01 Jan 0001 00:00:00 +0000

Remote-write rule forwarding

Grafana Enterprise Metrics (GEM) allows for forwarding metrics evaluated from the Ruler to any Prometheus remote-write compatible backend.

This works by loading rule groups into the Ruler with an extra config field as shown in the example below:

# A regular Grafana Mimir rule group
groups:
- name: group_one
interval: 5m
rules:
- expr: 'rate(prometheus_remote_storage_samples_in_total[5m])'
record: 'prometheus_remote_storage_samples_in_total:rate5m'
- name: group_two
interval: 1m
rules:
- expr: 'rate(prometheus_remote_storage_samples_in_total[1m])'
record: 'prometheus_remote_storage_samples_in_total:rate1m'
remote_write:
- url: 'http://user:pass@example.com/api/v1/push'

In the above example, when group_2 is loaded into Grafana Enterprise Metrics, the Ruler Module will evaluate the expression rate(prometheus_remote_storage_samples_in_total[1m]) every 1m and forward the generated metric with name prometheus_remote_storage_samples_in_total:rate1m to example.com. Meanwhile, group_1 will continue to work as expected, the evaluated metric prometheus_remote_storage_samples_in_total:rate5m will be stored within the same GEM tenant that is running the Ruler.

Configuration

Rule Storage

Remote write rules are compatible with the following backends:

Azure Blob Storage
GCS
S3
Swift

The following backends are not supported:

local filesystem
ConfigDB

Write-ahead log (WAL)

When a rule group is configured with a remote-write config, GEM buffers the generated metrics in a write-ahead log (WAL) before forwarding them to the remote-write endpoint. This is done to increase reliability in case either GEM or the remote endpoint crashes. If GEM crashes, it reads from the WAL and continues to forward metrics to the configured backend from the last sent timestamp. If the remote endpoint crashes, GEM continues to retry requests until it is available again. If multiple rule groups have been configured to send to the same remote-write endpoint, GEM will use a common WAL for the metrics generated by those rule groups. The WAL is truncated at the time specified by the ruler.remote-write.wal-truncate-frequency setting. WAL entries older than time specified in the ruler.remote-write.max-wal-time setting are removed. WAL entries younger than ruler.remote-write.min-wal-time are not removed.

By default, the WAL is stored in the wal folder in the GEM binary working directory.

$ ls
enterprise-metrics-binary wal/

The directory can be configured as shown:

ruler:
remote_write:
enabled: true
wal_dir: /tmp/wal
min_wal_time: 1h
max_wal_time: 5h
wal_truncate_frequency: 1h

Example

The following is a complete example of the above mentioned config options using a ruler with sharding enabled and S3 as its rule storage backend:

ruler:
external_url: localhost:9090
rule_path: "/tmp/rules"
storage:
type: s3
s3:
endpoint: minio:9000
access_key_id: cortex
secret_access_key: supersecret
bucketnames: "gem-ruler"
insecure: true
s3forcepathstyle: true
poll_interval: 10s
enable_api: true
enable_sharding: true
ring:
kvstore:
store: memberlist
remote_write:
enabled: true
wal_dir: /tmp/wal
min_wal_time: 1h
max_wal_time: 5h
wal_truncate_frequency: 1h

Loading remote-write groups

The mimirtool tool is compatible with Prometheus rule files that contain the remote-write rule group syntax. You can download and use the latest version of the mimirtool in the releases of Grafana Mimir.

You can also use the docker image of the mimirtool: docker pull grafana/mimirtool:latest

Example usage

Once you have GEM running with remote-write rule groups enabled you can load remote-write rule groups using the following procedure.

Save the following file to your workspace:

rules.yaml:

groups:
- name: remote_write_group
interval: 5m
rules:
- expr: "sum(up)"
record: "sum_up"
remote_write:
- url: "http://user:pass@example.com/api/v1/push"

Run the following command with mimirtool:

mimirtool rules sync \
--rule-files=rules.yaml \
--id=<tenant-name> \
--address=<gem-url> \
--key=<valid-gem-write-token>

Cluster query federation

Mon, 01 Jan 0001 00:00:00 +0000

Cluster query federation

NOTE: Cluster query federation is an experimental feature. As such, the configuration settings, command line flags, or specifics of the implementation are subject to change.

Overview

Since version 1.4, Grafana Enterprise Metrics (GEM) includes the optional component federation-frontend.

The goal of this component is to provide an ability to aggregate data from multiple GEM clusters in a single PromQL query. The underlying target clusters are queried using the Prometheus remote_read API and Labels API.

The component itself does not require any other components of Grafana Mimir. Therefore, you can run it on its own. A quite common use case is aggregating the data from two GEM clusters that are running in different regions:

Configuration

A minimal configuration of the federation-frontend has to disable authentication, because the federation frontend forwards the Basic authentication and Bearer token that is supplied by its clients to the underlying target clusters. Also, to start the federation frontend, configure the target to be federation-frontend.

You need to configure a list of target clusters within the federation.proxy_targets block; currently, there are no equivalent CLI flags available. Each entry requires a name that contains an identifier that will be exposed using the __cluster__ label in the query results and a url that points to a Prometheus compatible API. For GEM, use the URL http://<gem-host>/prometheus.

Optionally, you can configure each proxy_target to have Basic auth credentials, which override the user-supplied ones.

Warning: When you configure Basic auth via the proxy_target configuration, its credentials there take precedence over the client-supplied ones. Without other preventive action, any client that can reach the federation frontend can perform queries by using those credentials.

In the following example, two clusters in two different regions are queried via the federation frontend:

multitenancy_enabled: false # The federation frontend does not do any authentication itself
target: federation-frontend # Run the federation frontend only
federation:
proxy_targets:
- name: us-west
url: http://gem-us-west/prometheus
- name: us-east
url: http://gem-us-east/prometheus

Aggregate metrics from a local GEM cluster and Grafana Cloud Metric stack

The federation frontend allows you to get an aggregated view of metrics stored in a local GEM cluster and a hosted Grafana Cloud Metrics stack. With the following configuration, you can query both of the clusters as though they were one:

federation:
proxy_targets:
- name: own-data-center
url: http://gem/prometheus
- name: grafana-cloud
url: https://prometheus-us-central1.grafana.net/api/prom
basic_auth:
username: <tenant-id>
password: <token>

Warning: This gives any client that can reach the federation frontend access to your metrics data in Grafana Cloud Metrics without further authentication.

By using the authentication credentials of the local GEM cluster, you can execute a query against both clusters. To do so, set the access policy’s token as a variable for subsequent commands:

$ export API_TOKEN="the long token string you copied"

$ curl -s -u "<tenant-id>:$API_TOKEN" -G --data-urlencode "query=count(up) by (__cluster__)" http://federation-frontend/prometheus/api/v1/query | jq

{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__cluster__": "own-data-center"
},
"value": [1623344524.382, "10"]
},
{
"metric": {
"__cluster__": "grafana-cloud"
},
"value": [1623344524.382, "4"]
}
]
}
}

Limitations of cluster query federation

This experimental feature comes with some limitations:

No result caching in the federation frontend
No support for alerting/ruler on a federation level
No support for metric metadata endpoint
No support for exemplars

If your use case is blocked by one of those limitations, then feel free to reach out through our support channels with a feature request.

Self-monitoring

Mon, 01 Jan 0001 00:00:00 +0000

Self monitoring

NOTE Self-monitoring is an experimental feature. As such, the configuration settings, command line flags, or specifics of the implementation are subject to change.

Overview

Since version 1.4, Grafana Enterprise Metrics (GEM) includes the ability to directly record self-monitoring metrics to allow you to easily monitor the health and stability of GEM itself. The metrics GEM collects about itself are written to a built-in __system__ tenant. The metrics written can be queried as usual using tokens created under the built-in __system__ access policy. Since version 1.8, GEM directly records exemplars as part of self-monitoring metrics.

The way self-monitoring works ensures that any metrics available from GEM via /metrics endpoints will be available directly in GEM without needing to be scraped by an external process. While these metrics would ordinarily need to be scraped using Prometheus or the Grafana Agent, with self-monitoring they will be available after following the quick setup described below.

This feature provides a simple, out-of-the-box way to monitor GEM itself with a minimum amount of configuration or extra dependencies. To get the maximum value of this feature, we recommend you install GEM’s Grafana plug-in, which automatically provisions a set of dashboards that use the self-monitoring metrics. The dashboards are in line with Grafana Labs’ best practices for understanding GEM system health. Self-monitoring is compatible with plugin versions >= 3.0.4 (which require Grafana 8). Grafana 7.5 users should use version 2.1.1.

Configuration

The sections below describe the steps needed to set up self monitoring.

Single binary mode

Self-monitoring is enabled by default - no action is necessary in single binary mode!

Microservices mode

In order to use self-monitoring in microservices mode, you’ll need a hostname that you can use to address the gPRC port (9095 by default) of each of the GEM distributors. This could be a load balancer that balances between each distributor, a DNS A record that includes IPs for each distributor, or a Kubernetes service that balances between each gRPC port of the distributor pods. For the purposes of this example, we’ll assume that you are using a Kubernetes service and GEM is running in a namespace called enterprise-metrics.

Add the following section to your GEM configuration file used by each GEM pod or process.

instrumentation:
distributor_client:
address: dns:///distributor.enterprise-metrics.svc.cluster.local:9095

Or, you can alternatively add the command line flag to the arguments passed to each GEM pod or process.

-instrumentation.distributor-client.address='dns:///distributor.enterprise-metrics.svc.cluster.local:9095'

What is described above will give you system health metrics about the entire GEM cluster. To better understand GEM behavior, you also want to understand resource usage at a per-tenant level. In order to get the self-monitoring metrics you need to understand this behavior (and populate the “Per Tenant Usage” dashboards provisioned by the GEM plugin), you must also deploy the overrides-exporter component.

Exemplars

Since GEM 1.8, self-monitoring has the ability to directly record exemplars. However, recording of the exemplars under the __system__ tenant is still controlled by the same limits applied to all other tenants. This means that recording of exemplars for the __system__ tenant is disabled by default (as it is for all tenants) and must be enabled using the runtime configuration file or enabled globally.

Since the __system__ tenant is built into GEM itself and immutable, limits for it (such as enabling exemplars) cannot be set using the Admin API. Instead, if you wish to emit exemplars for the __system__ tenant you must override the max_global_exemplars_per_user setting for the __system__ tenant using the runtime configuration file or enable exemplars globally.

Here is an example of using the runtime configuration file:

overrides:
__system__:
max_global_exemplars_per_user: 300000

Verification

After you’ve deployed the configuration changes above, you’ll need to verify that self-monitoring is working correctly. We’ll learn how to query the self-monitoring metrics later, but to verify they’re working we can check a simple counter incremented when self-monitoring metrics are emitted.

Pick a single pod or process that is part of your GEM cluster. For this example, we’ll assume that you have picked an ingester.

Make a curl request to the /metrics endpoint of the ingester.

$ curl -s 'http://ingester-01.example.com/metrics' | grep 'cortex_self_monitoring_pushes_total'
# HELP cortex_self_monitoring_pushes_total Number of successes pushing self-monitoring metrics
# TYPE cortex_self_monitoring_pushes_total counter
cortex_self_monitoring_pushes_total 15

NOTE If you are running GEM in a Kubernetes cluster, individual pods might not be directly accessible from outside the Kubernetes cluster. In this case you can make the request from another pod running in the Kubernetes cluster, or you can make use of the kubectl port-forward command.

If the metric above is 0 or doesn’t exist, check the logs for each GEM component looking for errors or warnings related to pushing metrics to a distributor.

Querying

In order to query self-monitoring metrics directly, you’ll need to create a token associated with the __system__ access policy. The steps below assume you have already done this and copied down the token. The following examples further assume that your GEM cluster is available at the host gem.example.com over HTTPS.

First, set the token as a variable to use for the subsequent commands.

$ export API_TOKEN="the long token string you copied"

Next, we’ll make a request to the Prometheus query endpoint of GEM looking for a particular metric. In this case, grafana_metrics_enterprise_build_info

$ curl -s -u "__system__:$API_TOKEN" "https://gem.example.com/prometheus/api/v1/query?query=grafana_metrics_enterprise_build_info" | jq
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "grafana_metrics_enterprise_build_info",
"branch": "gem-release-1.4",
"goversion": "go1.16.3",
"instance": "ingester-01:80",
"revision": "ccd12b7a",
"target": "ingester",
"version": "v1.4.1"
},
"value": [
1622833381.751,
"1"
]
},
{
"metric": {
"__name__": "grafana_metrics_enterprise_build_info",
"branch": "gem-release-1.4",
"goversion": "go1.16.3",
"instance": "distributor-01:80",
"revision": "ccd12b7a",
"target": "distributor",
"version": "v1.4.1"
},
"value": [
1622833381.751,
"1"
]
},
<...snip...>
]
}
}

As you can see, querying self-monitoring metrics with GEM is the same process as querying any other type of metrics.

Implementation

Though you don’t need to be familiar with how self-monitoring works at a technical level, it’s detailed below in the hopes that it’s useful.

Gathering

Self-monitoring metrics are gathered internally the same way metrics exposed via the /metrics endpoint are: they are registered with a Prometheus Registerer on application start up. The metrics are updated during the normal course of running the application and periodically ( every 15 seconds by default) flushed directly to a distributor. Any metric available from the /metrics endpoint of a GEM component will also be available in the self-monitoring system.

The metrics are written to the distributor over its gRPC interface. This allows the self-monitoring system control over the exact tenant the metrics are stored under. This enables it to cleanly separate system metrics (under the __system__ tenant) from user data.

Injected labels

Normally, when metrics are scraped by Prometheus, labels are automatically added by Prometheus that identify where the metrics came from. Since self-monitoring metrics are not scraped by any external system, labels are automatically added internally to help identify which component the metrics came from.

The following labels are added to metrics emitted by the self-monitoring system.

instance: this label is made up of the node or host name a component is running on in combination with the HTTP port used. For example a value for this label in a GEM cluster running on Kubernetes might be ingester-1:80 or querier-5bf6ddccd7-hzbtn:80.
target: this label is made of a comma separated list of the targets a GEM process is running as (ingester, querier, etc.) or all in single binary mode.

System tenant and access policy

In order to cleanly separate self-monitoring data from user data, GEM comes with a built-in __system__ tenant and __system__ access policy. All self-monitoring data is written to the __system__ tenant. The self-monitoring data may be queried using tokens associated with the__system__ access policy. Because these are built into GEM itself, they cannot be removed. However, writing self-monitoring metrics to the system tenant can be turned off using the flag -instrumentation.enabled=false or the associated configuration setting.

Recording rules

In order to use self-monitoring metrics to power associated self-monitoring dashboards, the GEM ruler also includes built-in recording rules. These recording rules perform aggregations of self-monitoring metrics they same way the ruler aggregates other metrics. Because these recording rules are built-in to GEM itself, they cannot be removed. However, they can be turned off using the same flag that enables or disables self-monitoring -instrumentation.enabled=false or the associated configuration setting.

Overhead

Self-monitoring metrics are stored in GEM itself. Like any other metrics, they consume space in object storage. When enabled in microservices mode, each GEM component (ingester, querier, etc) will emit approximately 2000 series per component. These series are emitted for each component and GEM duplicates them based on the replication factor in the ingesters.

To understand how many series will be written under the __system__ tenant as part of self-monitoring, you can use the following formula:

2000 * $NUMBER_OF_GEM_PROCESSES * $REPLICATION_FACTOR

Since these series are written to GEM in a similar way to other series, they’ll be deduplicated by the compactor in object storage to reduce space required. To understand how many series will end up in object storage via the __system__ tenant, you can use the following formula:

2000 * $NUMBER_OF_GEM_PROCESSES

Exemplars

Mon, 01 Jan 0001 00:00:00 +0000

About exemplars in GEM

An exemplar is a specific trace representative of a repeated pattern of data in a given time interval. It helps you identify higher cardinality metadata from specific events within time series data. To learn more about exemplars and how they can help you isolate and troubleshoot problems with your systems, see Introduction to exemplars.

Grafana Enterprise Metrics includes the ability to store exemplars in-memory. Exemplar storage in GEM is implemented similarly to how it is in Prometheus. Exemplars are stored as a fixed size circular buffer that stores exemplars in memory for all series.

The limits_config property can be used to control the size of the circular buffer by the number of exemplars. For reference, an exemplar with just a traceID=<jaeger-trace-id> uses roughly 100 bytes of memory via the in-memory exemplar storage. If the exemplar storage is enabled, GEM will also append the exemplars to WAL for local persistence (for WAL duration).