Cross-cluster query federation
Enterprise RSS

Cross-cluster query federation

Note

Cross-cluster query federation is an experimental feature. As such, the configuration settings, command line flags, and specifics of the implementation are subject to change.

Overview

Starting with version 1.4, Grafana Enterprise Metrics (GEM) includes the optional federation-frontend component. The goal of this component is to provide the ability to combine data from multiple GEM clusters into a single PromQL query. The federation-frontend component queries underlying target clusters using the following methods:

You can run this component on its own, as it doesn’t require any other components of GEM. A common use case is combining the data from two GEM clusters that are running in different regions, as shown in the following diagram:

Cluster federation architecture

Configure the federation-frontend

To configure the federation-frontend component, you must disable authentication by disabling multitenancy. The federation-frontend forwards the Basic authentication and Bearer token supplied by its clients to the underlying target clusters.

Additionally, to start the federation-frontend, you must configure the target to be federation-frontend.

You must configure a list of target clusters within the federation.proxy_targets block in the YAML configuration file. There are no equivalent CLI flags available. Each entry requires a name that contains an identifier that’s exposed using the __cluster__ label in the query results and a url that points to a GEM instance. For GEM clusters using the default Prometheus HTTP prefix, use the following URL: http://<gem-host>/prometheus.

Optionally, you can configure each proxy_target to have Basic auth credentials, which override the user-supplied ones.

Warning

When you configure Basic auth via the proxy_target configuration, its credentials there take precedence over the client-supplied ones. Without other preventive action, any client that can reach the federation frontend can perform queries by using those credentials.

In the following example, two clusters in two different regions are queried via the federation-frontend:

yaml
multitenancy_enabled: false # The federation-frontend does not do any authentication itself
target: federation-frontend # Run the federation-frontend only

federation:
  proxy_targets:
    - name: us-west
      url: http://gem-us-west/prometheus
    - name: us-east
      url: http://gem-us-east/prometheus

More details on the configuration options available in proxy_targets, including authentication and TLS options, are available in the configuration reference.

Note

Using cross-cluster query federation in GEM version 2.16 or higher only supports target clusters running GEM version 2.16 or higher. If the federation-frontend is configured to use a target cluster running an older version of GEM, queries will fail with the following error: the federation-frontend only works with GEM proxy targets; the proxy target "xxx" is not a GEM cluster.

Configure cross-cluster sharding

Cross-cluster sharding shards queries into different clusters and then runs the queries on those individual clusters before combining them on the federation-frontend. Compared to using remote read, this approach improves the performance of cross-cluster queries through providing more distributed computation and less network transfer compared to using remote read for all queries. It also takes advantage of all the query acceleration techniques available in the target GEM cluster, including query result caching, splitting, and sharding.

The federation-frontend component enables cross-cluster sharding by default. You can configure cross-cluster sharding using the cluster_sharding_enabled setting and the -federation.cluster-sharding-enabled CLI flag. For example:

yaml
federation:
  cluster_sharding_enabled: true

Not all queries can be evaluated with cross-cluster sharding. If the federation-frontend receives a query that cannot be evaluated with cross-cluster sharding, it automatically falls back to using remote read for that query.

Combine metrics from a local GEM cluster and Grafana Cloud Metric stack

The federation-frontend allows you to get a combined view of metrics stored in a local GEM cluster and a hosted Grafana Cloud Metrics stack. With the following configuration, you can query both of the clusters as though they were one:

yaml
federation:
  proxy_targets:
    - name: own-data-center
      url: http://gem/prometheus
    - name: grafana-cloud
      url: https://prometheus-us-central1.grafana.net/api/prom
      basic_auth:
        username: <tenant-id>
        password: <token>

Warning

This gives any client that can reach the federation-frontend access to your metrics data in Grafana Cloud Metrics without further authentication.

By using the authentication credentials of the local GEM cluster, you can execute a query against both clusters. To do so, set the access policy’s token as a variable for subsequent commands:

$ export API_TOKEN="the long token string you copied"
$ curl -s -u "<tenant-id>:$API_TOKEN" -G --data-urlencode "query=count(up) by (__cluster__)" http://federation-frontend/prometheus/api/v1/query | jq
json
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__cluster__": "own-data-center"
        },
        "value": [1623344524.382, "10"]
      },
      {
        "metric": {
          "__cluster__": "grafana-cloud"
        },
        "value": [1623344524.382, "4"]
      }
    ]
  }
}

Configure partial results

By default, the federation-frontend returns an error for any query when one or more of the target clusters returns an error or does not respond.

Alternatively, the federation-frontend can compute partial results based on the successful responses from target clusters and ignore the responses from target clusters that return an error or do not respond.

You can enable partial results with the following CLI flags and configuration options:

  • For range and instant queries: -federation.partial-queries-enabled CLI flag or federation.partial_queries_enabled YAML configuration file option
  • For metadata queries, which are label names, label values and series queries: -federation.partial-metadata-enabled CLI flag or federation.partial_metadata_enabled YAML configuration file option

It is also possible to enable or disable partial results by setting a HTTP header on the query request:

  • For range and instant queries: X-Partial-Queries-Enabled: true to enable partial results, X-Partial-Queries-Enabled: false to disable partial results
  • For metadata queries: X-Partial-Metadata-Enabled: true to enable partial results, X-Partial-Metadata-Enabled: false to disable partial results

For range and instant queries, a warning is included in the response from the federation-frontend and displayed in Grafana if only a subset of target clusters’ responses are used.

Enabling partial queries has the same effect regardless of whether the query is evaluated with remote read or cross-cluster sharding. For queries where cross-cluster sharding sends multiple requests to each target cluster for a single query, either all responses from a target cluster are used, or none are used, including when only some fail. For example, if the query is sum(foo) + sum(bar), and a cluster returns a successful response for sum(foo) but not for sum(bar), then the successful response for sum(foo) from that cluster is discarded as well.

For metadata queries, no warning is included in the response from the federation-frontend if only a subset of target clusters’ responses are used.

Limitations of cross-cluster query federation

This experimental feature has the following limitations:

  • No result caching in the federation-frontend
  • No support for alerting/ruler on a federation level
  • No support for endpoints other than range queries, instant queries, and metadata queries. Metadata queries consist of label names queries, labels values queries, and series queries.
  • No support for exemplars

If your use case is blocked by one of those limitations, then feel free to reach out through our support channels with a feature request.