Slide 6 of 9

From results to thresholds

Choosing threshold values

Start with your observed values and add headroom for normal variation:

MetricObservedThresholdHeadroom
p95 latency320 msp(95)<500~55% above observed
Error rate0%rate<0.01Allows up to 1%
Check pass rate100%rate>0.99Allows up to 1% failure

Trade-offs in threshold decisions

Threshold-setting involves judgment, not just formulas. Consider these trade-offs for your system:

DecisionTighter thresholdLooser threshold
HeadroomCatches small regressions earlyAvoids false failures from normal variation
p95 vs p99p95 reflects most users’ experiencep99 catches tail latency affecting your slowest users
Error tolerancerate<0.001 is strict, good for payment flowsrate<0.01 is practical for non-critical endpoints

A common starting approach: use p95 with 30-50% headroom for general APIs. Switch to p99 with tighter headroom for latency-sensitive paths like checkout or authentication.

Script

You do not guess threshold values. You measure them first. Run once without thresholds, note what your system actually does under your load profile, then set limits with headroom on top of those numbers.

The table on this slide shows a concrete example: observed ninety-fifth percentile latency, error rate, and check pass rate, plus suggested thresholds and why the headroom exists. The trade-offs section walks through tighter versus looser gates and when to favor ninety-fifth versus ninety-ninth percentile.

After a baseline passes reliably, reuse the same script in continuous integration or on a schedule so performance regressions surface before users do.