Example of dynamic thresholds per dimension
Grafana Cloud Enterprise Open source

Example of dynamic thresholds per dimension

In Grafana Alerting, each alert rule supports only one condition expression.

That’s enough in many cases—most alerts use a fixed numeric threshold like latency > 3s or error_rate > 5% to determine their state.

As your alerting setup grows, you may find that different targets require different threshold values.

Instead of duplicating alert rules, you can assign a different threshold value to each target—while keeping the same condition. This simplifies alert maintenance.

This example shows how to do that using multi-dimensional alerts and a Math expression.

Example overview

You’re monitoring latency across multiple API services. Initially, you want to get alerted if the 95th percentile latency (p95_api_latency) exceeds 3 seconds, so your alert rule uses a single static threshold:

p95_api_latency > 3

But the team quickly finds that some services require stricter thresholds. For example, latency for payment APIs should stay under 1.5s, while background jobs can tolerate up to 5s. The team establishes different thresholds per service:

  • p95_api_latency{service="checkout-api"}: must stay under 1.5s.
  • p95_api_latency{service="auth-api"}: also strict, 1.5s.
  • p95_api_latency{service="catalog-api"}: less critical, 3s.
  • p95_api_latency{service="async-tasks"}: background jobs can tolerate up to 5s.

You want to avoid creating one alert rule per service—this is harder to maintain.

In Grafana Alerting, you can define one alert rule that monitors multiple similar components like this scenario. This is called multi-dimensional alerts: one alert rule, many alert instances—one per unique label set.

But there’s an issue: Grafana supports only one alert condition per rule.

One alert rule
├─ One condition ( e.g., $A > 3)
│  └─ Applies to all returned series in $A
│     ├─ {service="checkout-api"}
│     ├─ {service="auth-api"}
│     ├─ {service="catalog-api"}
│     └─ {service="async-tasks"}

To evaluate per-service thresholds, you need a distinct threshold value for each returned series.

Dynamic thresholds using a Math expression

You can create a dynamic alert condition by operating on two queries with a Math expression.

  • $A for query results (e.g., p95_api_latency).
  • $B for per-service thresholds (from CSV data or another query).
  • $A > $B is the Math expression that defines the alert condition.

Grafana evaluates the Math expression per series, by joining series from $A and $B based on their shared labels before applying the expression.

Here’s an example of an arithmetic operation:

  • $A returns series {host="web01"} 30 and {host="web02"} 20.
  • $B returns series {host="web01"} 10 and {host="web02"} 0.
  • $A + $B returns {host="web01"} 40 and {host="web02"} 20.

In practice, you must align your threshold input with the label sets returned by your alert query.

The following table illustrates how a per-service threshold is evaluated in the previous example:

$A: p95 latency query$B: threshold value$C: $A>$BState
{service="checkout-api"} 3{service="checkout-api"} 1.5{service="checkout-api"} 1Firing
{service="auth-api"} 1{service="auth-api"} 1.5{service="auth-api"} 0Normal
{service="catalog-api"} 2{service="catalog-api"} 3{service="catalog-api"} 0Normal
{service="sync-work"} 3{service="sync-work"} 5{service="sync-work"} 0Normal

In this example:

  • $A comes from the p95_api_latency query.
  • $B is manually defined with a threshold value for each series in $A.
  • The alert condition compares $A>$B using a Math relational operator (e.g., >, <, >=, <=, ==, !=) that joins series by matching labels.
  • Grafana evaluates the alert condition and sets the firing state where the condition is true.

The Math expression works as long as each series in $A can be matched with exactly one series in $B. They must align in a way that produces a one-to-one match between series in $A and $B.

Caution

If a series in one query doesn’t match any series in the other, it’s excluded from the result and a warning message is displayed:

1 items dropped from union(s): ["$A > $B": ($B: {service=payment-api})]

Labels in both series don’t need to be identical. If labels are a subset of the other, they can join. For example:

  • $A returns series {host="web01", job="event"} 30 and {host="web02", job="event"} 20.
  • $B returns series {host="web01"} 10 and {host="web02"} 0.
  • $A + $B returns {host="web01", job="event"} 40 and {host="web02", job="event"} 20.

Try it with TestData

You can use the TestData data source to replicate this example:

  1. Add the TestData data source through the Connections menu.

  2. Create an alert rule.

    Navigate to AlertingAlert rules and click New alert rule.

  3. Simulate a query ($A) that returns latencies for each service.

    Select TestData as the data source and configure the scenario.

    • Scenario: Random Walk

    • Alias: latency

    • Labels: service=api-$seriesIndex

    • Series count: 4

    • Start value: 1

    • Min: 1, Max: 4

      This uses $seriesIndex to assign unique service labels: api-0, api-1, etc.

    TestData data source returns 4 series to simulate latencies for distinct API services.
  4. Define per-service thresholds with static data.

    Add a new query ($B) and select TestData as the data source.

    From Scenario, select CSV Content and paste this CSV:

     service,value
     api-0,1.5
     api-1,1.5
     api-2,3
     api-3,5

    The service column must match the labels from $A.

    The value column is a numeric value used for the alert comparison.

    For details on CSV format requirements, see table data examples.

  5. Add a new Reduce expression ($C).

    • Type: Reduce
    • Input: A
    • Function: Mean
    • Name: C

    This calculates the average latency for each service: api-0, api-1, etc.

  6. Add a new Math expression.

    • Type: Math
    • Expression: $C > $B
    • Set this expression as the alert condition.

    This fires if the average latency ($C) exceeds the threshold from $B for any service.

  7. Preview the alert.

    Alert preview evaluating multiple series with distinct threshold values
    Alert preview evaluating multiple series with distinct threshold values
    Give it a try using Grafana Play
    Give it a try using Grafana Play

    With Grafana Play, you can explore and see how it works, learning from practical examples to accelerate your development. This feature can be seen on this alert example.

Other use cases

This example showed how to build a single alert rule with different thresholds per series using multi-dimensional alerts and Math expressions.

This approach scales well when monitoring similar components with distinct reliability goals.

By aligning series from two queries, you can apply a dynamic threshold—one value per label set—without duplicating rules.

While this example uses static CSV content to define thresholds, the same technique works in other scenarios:

  • Dynamic thresholds from queries or recording rules: Fetch threshold values from a real-time query, or from custom recording rules.
  • Combine multiple conditions: Build more advanced threshold logic by combining multiple conditions—such as latency, error rate, or traffic volume.

For example, you can define a PromQL expression that sets a latency threshold which adjusts based on traffic—allowing higher response times during periods of high-load.

(
  // Fires when p95 latency > 2s during usual traffic (≤ 1000 req/s)
  service:latency:p95 > 2 and service:request_rate:rate1m <= 1000
)
or
(
  // Fires when p95 latency > 4s during high traffic (> 1000 req/s)
  service:latency:p95 > 4 and service:request_rate:rate1m > 1000
)