Latency

Grafana Cloud

SLI example for latency

This guide provides examples of how to define latency SLIs using different Prometheus metric types. The basic SLO example for demonstration purposes is as follows:

SLI category	SLI description	Target	Time window
Latency	Requests respond within 2 seconds	99%	28d

The SLI in this example includes all requests, and the SLO defines the target percentage.

When possible, avoid using percentiles in SLIs, such as 95th percentile latency with a 99% target, to maintain simplicity and consistency across SLO types. Refer to Building good SLOs—CRE life lessons from Google Cloud for more on this topic.

Before you begin, read the SLI availability examples to understand how SLIs are defined in Grafana SLO:

Note
Grafana expects SLIs to parse as a ratio-like query: numerator / denominator.
The SLI query result must return a ratio between 0 and 1, where 1 means 100% of events were successful. This is required to evaluate whether the SLI meets the SLO target.

Screenshot of the graph result of an SLI ratio

Probe latency (using Prometheus Gauge)

This example uses the probe_duration_seconds metric from Synthetic Monitoring probes to verify public latency. For details on how Synthetic Monitoring probes work, see the SLI availability examples using probes.

Metric	Type	Description
`probe_duration_seconds`	Gauge	How long the probe took to complete in seconds

In the Grafana SLO wizard, you can create SLIs using two options:

Ratio query builder: Enter counter metrics for successful and total events.
Advanced: Enter the ratio SLI query directly.

Because probe_duration_seconds is not a counter metric, choose the Advanced option to create the SLI query.

SLIs are defined as ratio-like queries, either as the ratio of successful events or the ratio of successful event rates:

# ratio of successful event rates formula
Success ratio = rate of successful events (over a period)
                /  
                rate of total events (over a period)

# ratio of successful events formula
Success ratio = number of successful events (over a period)
                /  
                total number of events (over a period)

With gauge metrics, you can implement the ratio of successful events formula as follows:

# number of successful probe requests over the interval
sum(
  count_over_time(
    (probe_duration_seconds{job="<JOB_NAME>"} < 2)[$__interval:]
  )
)
/
# number of total probe requests over the interval
sum(
  count_over_time(
    probe_duration_seconds{job="<JOB_NAME>"}[$__interval:]
  )
)

Here’s the breakdown of the numerator query:

# number of successful probe requests over the interval
sum(
  count_over_time(
    (probe_duration_seconds{job="<JOB_NAME>"} < 2)[$__interval:]
  )
)

probe_duration_seconds{job="<JOB_NAME>"} < 2
Returns probe latency samples. The < 2 comparison filters samples where latency is within the SLI threshold (less than two seconds).
The result is a binary series: 1 for success and no sample for failure.
[$__interval:]
Runs the previous expression over the past $__interval.
Because count_over_time works only on range vectors, it uses a subquery [:] to produce a range vector containing all samples from that period.
count_over_time(...) Counts the number of samples in the previous query, the number of successful probe requests in the range vector.
Finally, sum(...) aggregates across all series (dimensions).

The numerator is then divided by the total number of probe requests over the same interval using a similar query:

/
# number of total probe requests over the interval
sum(
  count_over_time(
    probe_duration_seconds{job="<JOB_NAME>"}[$__interval:]
  )
)

Note
$__rate_interval is recommended for calculating rate() in other examples. When you use other _over_time() functions that don’t require at least two data points, it’s better to use $__interval to achieve finer error budget resolution by evaluating SLIs at smaller time intervals.

Alternatively, the numerator can use bool and sum_over_time:

# number of successful probe requests over the interval
# `bool` returns a binary 0/1 series and `sum_over_time` sums 1s for successes
sum(
    sum_over_time(
      (probe_duration_seconds{job="<JOB_NAME>"} < bool 2)[$__interval:]
    )
)
/
# number of total probe requests over the interval
sum(
  count_over_time(
    probe_duration_seconds{job="<JOB_NAME>"}[$__interval:]
  )
)

Probe latency (using Classic Histogram)

The SLI example uses the probe_all_duration_seconds histogram metric, whose SLI query is different.

Metric	Type	Description
`probe_all_duration_seconds`	Histogram	How long the probe took to complete in seconds

Classic histogram metrics in Prometheus store samples based on their value (latency in this case) and expose additional series:

*_count: Returns all samples for all latencies.
*_bucket: Returns samples per configured buckets. The buckets for this metric are 0, 0.005, 0.1, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, and +Inf.

You can use a histogram metric to return the number of successful samples if the metric includes a bucket for the specific SLI threshold.

However, probe_all_duration_seconds does not include a bucket for 2s, and cannot be used to filter histogram samples at that threshold. For alternatives, refer to handle a threshold not available as a bucket.

This example uses a different threshold (2.5s) for demonstration purpose. Use the Ratio option to build the SLI query as follows:

Ratio query builder	Value	Description
Success metric	`probe_all_duration_seconds_bucket{job="<JOB_NAME>", le="2.5"}`	Number of probes requests under 2.5s
Total metric	`probe_all_duration_seconds_count{job="<JOB_NAME>"}`	Total number of probe requests
Grouping	(leave empty)	Creates a single SLI dimension See the multidimensional SLI example

Click Run queries to generate the final SLI ratio query:

Screenshot of the Grafana SLO wizard creating an SLI for latency using a Prometheus histogram metric

The auto-generated SLI implements the ratio of successful event rates formula:

Success ratio = rate of successful events (over a period)
                /  
                rate of total events (over a period)

The SLI query returns a ratio between 0 and 1, where 1 means 100% of events were successful.

To learn why the auto-generated SLI is formed this way and how it works, refer to the breakdown of the ratio SLI query of the HTTP availability example.

Handle a threshold not available as a bucket

With classic histograms, it’s common for your SLI threshold to not match an existing histogram bucket, as in this example:

The SLI searches for responses under 2 seconds.
But the available buckets are configured for 1 and 2.5, not 2.

In this case, probe_all_duration_seconds_bucket{job="<JOB_NAME>", le="2"} does not work, and you should consider other approaches:

Add a bucket for your threshold: If you control the instrumentation, update the histogram metric to include a bucket for the exact SLI threshold.
Use a fallback metric: Check if a latency metric is available like in the previous Gauge example, or the native histogram example below.
Approximate using the nearest bucket: Use the nearest higher or lower bucket. Document this clearly and adapt your SLO settings, as the SLO no longer match the intended SLI threshold.

HTTP latency (using Native Histogram)

In contrast to classic histograms, Prometheus native histograms can estimate the number of observations within arbitrary ranges—not just preconfigured buckets.

histogram_fraction() returns the estimated fraction of observations within a specified range, as a value between 0 and 1.

This function must be used with rate() or increase() to correctly handle counter resets. The SLI query should follow the ratio of successful event rates formula:

# ratio of successful event rates formula
Success ratio = rate of successful events (over a period)
                /  
                rate of total events (over a period)

This example uses http_request_duration_seconds, a native histogram metric that records request latency. The complete SLI formula look like this:

(
histogram_fraction(0, 2, sum(rate(http_request_duration_seconds[$__rate_interval])))
*
histogram_count(         sum(rate(http_request_duration_seconds[$__rate_interval])))
)
/
histogram_count(         sum(rate(http_request_duration_seconds[$__rate_interval])))

rate(http_request_duration_seconds[$__rate_interval]) calculates the per-second rate of observations over the interval.
sum (...) aggregates across all dimensions (for example, all endpoints or status codes).
histogram_fraction(0, 2, ...) estimates the fraction of the request rate with latency between 0 and 2 seconds during the interval.
histogram_count(...) returns the total request rate for the interval.

The query follows the ratio-style format (numerator/denominator) of event-based SLIs:

Numerator: histogram_fraction(...) * histogram_count(...).
Denominator: histogram_count(...).
It returns a ratio between 0 and 1, where 1 means 100% of requests were under 2 seconds.

Note
For consistency, this example uses histogram_count(sum()).
However, sum(histogram_count()) returns the same query result and is often more efficient.

Was this page helpful?

Email docs@grafana.com

Help and support

Community

SLI example for latency

Probe latency (using Prometheus Gauge)

Probe latency (using Classic Histogram)

Handle a threshold not available as a bucket

HTTP latency (using Native Histogram)

Was this page helpful?

Related resources from Grafana Labs