We’re thrilled to announce the release of Grafana Enterprise Metrics (GEM) 1.5. While this release packs in a ton of enhancements and bug fixes, we’d like to dive into two particularly exciting features: per-tenant usage metrics and a wildcard tenant for queries.
Per-tenant usage metrics
One of the things that makes GEM great for observability teams trying to offer Prometheus-as-a-service at their companies is its native support for multi-tenancy. This enables observability teams to run one GEM cluster to support multiple teams, but give each team the experience of having a dedicated metrics database. Because teams are each assigned their own tenant, their data and queries are isolated from one another. Operators can also set limits on each tenant to ensure that the cluster’s resources are shared fairly among users.
Given how heavily many GEM operators rely on multi-tenancy, we knew it would be valuable for them to understand system usage at a per-tenant granularity. How many active series does TenantA have relative to TenantB? How many queries is TenantB making relative to TenantC? In organizations where tenants map directly to developer teams, answering these questions enables observability teams to determine who is sending the most metrics, writing the most alerting and recording rules, and running the heaviest queries. This is important when it comes time to allocating costs or identifying whose usage is spiraling out of control.
So what did we do? Piggy-backing on the development we’d done to enable self-monitoring in GEM 1.4, we added telemetry metrics to GEM for each tenant’s usage and limit settings. GEM then scrapes itself to gather these metrics and writes them to its dedicated system monitoring tenant, where they can then be queried via PromQL as if they were any other time series.
To make this information even easier to consume, we created a set of pre-built dashboards that get automatically installed with the GEM plugin. Observability teams can start from an overview dashboard that displays all tenants, and then drill down into a more detailed view of whatever tenant they’re interested in learning more about.
The dashboards also show where a tenant’s usage is relative to its limits. GEM operators can now easily spot tenants that are running up against their limits and make the call on whether to increase those limits or work with the teams to reduce their usage.
We recommend setting usage limits for every tenant to avoid the risk of any one tenant taking down GEM. For this reason, the per-tenant usage dashboards also highlight tenants with unset limits so GEM admins can go back and properly configure them.
Wildcard tenant name for queries
GEM’s admin-api allows users to create access policies that grant read access to multiple tenants. For example, you could create a read-access policy that lets you read Tenant1, Tenant2, and Tenant3’s metrics.
When querying GEM, you use the
user field to specify which of those tenants you want to read. If you want to query Tenant1, set
Tenant1. If you want to query Tenant1 and Tenant2, use the pipe (|) character, setting
Tenant1|Tenant2. If you want to query all three tenants at once, set
While this works fine in my simple example above, you could see how this could start to get painful when you want to query a large set of tenants. Even with a smaller number of tenants, you can have issues if you forget or misspell the names of the particular tenants you have access to.
So in GEM 1.5, we set out to alleviate some of this pain with the help of the wildcard () character. Set the
user field to
*in your query, and GEM will run the query against all tenants your read-access policy gives you access to, without having to enumerate them explicitly. In the example above, the clunky
Tenant1|Tenant2|Tenant3 string gets replaced with a single character ().
curl -u “TeamA|TeamB|TeamC|TeamD:$API_TOKEN” '<http://enterprise-metrics/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z>'
curl -u “*:$API_TOKEN” 'http://enterprise-metrics/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z'
This is a small change, but a really nice quality of life improvement for users. And when used in conjunction with GEM’s existing ability to create wildcard read-access policies, it unlocks one particularly powerful use case. A wildcard read-access policy allows the user to query any tenant in the GEM cluster. Specify a
* for the user field and a token for a wildcard read-access policy as your password, and GEM will run your query against all tenants in the cluster. With this approach, you’re guaranteed to always get data for all tenants in the cluster, without having to update the
user field whenever a tenant is added or removed from GEM.
Note: While wildcards are incredibly powerful, they can also be very dangerous if used incorrectly. You can end up running massive queries that might knock over your cluster, or giving away access to data you intended to keep private. Use them thoughtfully!
With that caveat, I encourage you to go check out GEM 1.5 and test out these new features for yourself!
If you’d like to learn more about Grafana Enterprise Metrics, you can watch the “Running Prometheus-as-a-service with Grafana Enterprise Metrics” webinar on demand and tune in live today, Aug. 26, at 9:30 PT / 12:30 ET / 16:30 UTC for “Intro to metrics with Grafana: Prometheus, Graphite, and beyond.” You can also read more about GEM in the docs, and contact us if you’d like to try it out!