Cross-cluster query federation
Note
Cross-cluster query federation is an experimental feature. As such, the configuration settings, command line flags, and specifics of the implementation are subject to change.
Overview
Starting with version 1.4, Grafana Enterprise Metrics (GEM) includes the optional federation-frontend
component. The goal of this component is to provide the ability to combine data from
multiple GEM clusters into a single PromQL query. The federation-frontend
component queries underlying target clusters using the following methods:
- Mimir query sharding adapted for the target clusters. This is also known as cluster sharding.
- The Prometheus
remote_read
API - The Metadata APIs
You can run this component on its own, as it doesn’t require any other components of GEM. A common use case is combining the data from two GEM clusters that are running in different regions, as shown in the following diagram:
Configure the federation-frontend
To configure the federation-frontend
component, you must disable
authentication by disabling multitenancy. The federation-frontend forwards the Basic
authentication and Bearer token supplied by its clients to the underlying target
clusters.
Additionally, to start the federation-frontend, you must configure the target to be
federation-frontend
.
You must configure a list of target clusters within the
federation.proxy_targets
block in the YAML configuration file.
There are no equivalent CLI flags available.
Each entry requires a name
that contains an identifier that’s
exposed using the __cluster__
label in the query results and a url
that
points to a GEM instance. For GEM clusters using the default Prometheus HTTP prefix,
use the following URL: http://<gem-host>/prometheus
.
Optionally, you can configure each proxy_target
to have Basic auth
credentials, which override the user-supplied ones.
Warning
When you configure Basic auth via the
proxy_target
configuration, its credentials there take precedence over the client-supplied ones. Without other preventive action, any client that can reach the federation frontend can perform queries by using those credentials.
In the following example, two clusters in two different regions are queried via the federation-frontend:
multitenancy_enabled: false # The federation-frontend does not do any authentication itself
target: federation-frontend # Run the federation-frontend only
federation:
proxy_targets:
- name: us-west
url: http://gem-us-west/prometheus
- name: us-east
url: http://gem-us-east/prometheus
More details on the configuration options available in proxy_targets
, including
authentication and TLS options, are available in
the configuration reference.
Note
Using cross-cluster query federation in GEM version 2.16 or higher only supports target clusters running GEM version 2.16 or higher. If the federation-frontend is configured to use a target cluster running an older version of GEM, queries will fail with the following error:
the federation-frontend only works with GEM proxy targets; the proxy target "xxx" is not a GEM cluster
.
Configure cross-cluster sharding
Cross-cluster sharding shards queries into different clusters and then runs the queries on those individual clusters before combining them on the federation-frontend. Compared to using remote read, this approach improves the performance of cross-cluster queries through providing more distributed computation and less network transfer compared to using remote read for all queries. It also takes advantage of all the query acceleration techniques available in the target GEM cluster, including query result caching, splitting, and sharding.
The federation-frontend
component enables cross-cluster sharding by default. You can configure cross-cluster sharding using the cluster_sharding_enabled
setting and the -federation.cluster-sharding-enabled
CLI flag. For example:
federation:
cluster_sharding_enabled: true
Not all queries can be evaluated with cross-cluster sharding. If the federation-frontend receives a query that cannot be evaluated with cross-cluster sharding, it automatically falls back to using remote read for that query.
Combine metrics from a local GEM cluster and Grafana Cloud Metric stack
The federation-frontend allows you to get a combined view of metrics stored in a local GEM cluster and a hosted Grafana Cloud Metrics stack. With the following configuration, you can query both of the clusters as though they were one:
federation:
proxy_targets:
- name: own-data-center
url: http://gem/prometheus
- name: grafana-cloud
url: https://prometheus-us-central1.grafana.net/api/prom
basic_auth:
username: <tenant-id>
password: <token>
Warning
This gives any client that can reach the federation-frontend access to your metrics data in Grafana Cloud Metrics without further authentication.
By using the authentication credentials of the local GEM cluster, you can execute a query against both clusters. To do so, set the access policy’s token as a variable for subsequent commands:
$ export API_TOKEN="the long token string you copied"
$ curl -s -u "<tenant-id>:$API_TOKEN" -G --data-urlencode "query=count(up) by (__cluster__)" http://federation-frontend/prometheus/api/v1/query | jq
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__cluster__": "own-data-center"
},
"value": [1623344524.382, "10"]
},
{
"metric": {
"__cluster__": "grafana-cloud"
},
"value": [1623344524.382, "4"]
}
]
}
}
Configure partial results
By default, the federation-frontend returns an error for any query when one or more of the target clusters returns an error or does not respond.
Alternatively, the federation-frontend can compute partial results based on the successful responses from target clusters and ignore the responses from target clusters that return an error or do not respond.
You can enable partial results with the following CLI flags and configuration options:
- For range and instant queries:
-federation.partial-queries-enabled
CLI flag orfederation.partial_queries_enabled
YAML configuration file option - For metadata queries, which are label names, label values and series queries:
-federation.partial-metadata-enabled
CLI flag orfederation.partial_metadata_enabled
YAML configuration file option
It is also possible to enable or disable partial results by setting a HTTP header on the query request:
- For range and instant queries:
X-Partial-Queries-Enabled: true
to enable partial results,X-Partial-Queries-Enabled: false
to disable partial results - For metadata queries:
X-Partial-Metadata-Enabled: true
to enable partial results,X-Partial-Metadata-Enabled: false
to disable partial results
For range and instant queries, a warning is included in the response from the federation-frontend and displayed in Grafana if only a subset of target clusters’ responses are used.
Enabling partial queries has the same effect regardless of whether the query is evaluated with remote read or cross-cluster sharding.
For queries where cross-cluster sharding sends multiple requests to each target cluster for a single query, either all responses from a target cluster are used, or none are used,
including when only some fail. For example, if the query is sum(foo) + sum(bar)
, and a cluster returns a successful response for sum(foo)
but not for sum(bar)
, then the
successful response for sum(foo)
from that cluster is discarded as well.
For metadata queries, no warning is included in the response from the federation-frontend if only a subset of target clusters’ responses are used.
Limitations of cross-cluster query federation
This experimental feature has the following limitations:
- No result caching in the federation-frontend
- No support for alerting/ruler on a federation level
- No support for endpoints other than range queries, instant queries, and metadata queries. Metadata queries consist of label names queries, labels values queries, and series queries.
- No support for exemplars
If your use case is blocked by one of those limitations, then feel free to reach out through our support channels with a feature request.