---
title: "The baseline methodology | Grafana Labs"
description: "The observe-set-validate workflow that turns test results into automated quality gates"
---

> For a curated documentation index, see [llms.txt](/llms.txt). For the complete documentation index, see [llms-full.txt](/llms-full.txt).

## Scenario

A change **passes functional tests** and feels fine when you click through alone, but the **p95** you care about was never written down. Under **dozens of concurrent users**, the same build might already be slow or noisy; you would not know until a release or an incident. A baseline turns “it felt OK” into **numbers you can compare** next week.

## The baseline workflow

| Step            | What you do                                        | What it produces                   |
|-----------------|----------------------------------------------------|------------------------------------|
| **1. Observe**  | Run without thresholds at realistic load           | Actual p95, error rate, throughput |
| **2. Set**      | Add thresholds based on observed values + headroom | Script with pass/fail criteria     |
| **3. Validate** | Re-run with thresholds, confirm consistency        | Working quality gate               |

## Why this order matters

- **Observe before you set thresholds.** Use measured values. Guesses drift and break trust in the gate.
- **Validate before you wire CI.** Stable thresholds first. Flaky gates waste the whole team’s time.
- **Leave headroom on p95.** A gap between what you saw and the limit absorbs normal jitter. It still catches real regressions.

## The shift

| Before baselines                 | After baselines                  |
|----------------------------------|----------------------------------|
| “The results looked okay to me”  | The test passed with exit code 0 |
| Someone reviews metrics manually | CI/CD decides automatically      |
| Regressions found in production  | Regressions blocked at merge     |
