From results to thresholds

Choosing threshold values

Start with your observed values and add headroom for normal variation:

Metric	Observed	Threshold	Headroom
p95 latency	320 ms	`p(95)<500`	~55% above observed
Error rate	0%	`rate<0.01`	Allows up to 1%
Check pass rate	100%	`rate>0.99`	Allows up to 1% failure

Trade-offs in threshold decisions

Threshold-setting involves judgment, not just formulas. Consider these trade-offs for your system:

Decision	Tighter threshold	Looser threshold
Headroom	Catches small regressions early	Avoids false failures from normal variation
p95 vs p99	p95 reflects most users’ experience	p99 catches tail latency affecting your slowest users
Error tolerance	`rate<0.001` is strict, good for payment flows	`rate<0.01` is practical for non-critical endpoints

A common starting approach: use p95 with 30-50% headroom for general APIs. Switch to p99 with tighter headroom for latency-sensitive paths like checkout or authentication.

You do not guess threshold values. You measure them first. Run once without thresholds, note what your system actually does under your load profile, then set limits with headroom on top of those numbers.

The table on this slide shows a concrete example: observed ninety-fifth percentile latency, error rate, and check pass rate, plus suggested thresholds and why the headroom exists. The trade-offs section walks through tighter versus looser gates and when to favor ninety-fifth versus ninety-ninth percentile.

After a baseline passes reliably, reuse the same script in continuous integration or on a schedule so performance regressions surface before users do.

From results to thresholds

Choosing threshold values

Trade-offs in threshold decisions

Script

In this section

Still have questions?

Get every update