Operations at Level 3

Transaction-level operations

At Level 3, alerting works at the aggregate level — percentiles, error rates, and SLOs. Traces are for investigation after an alert fires, not the source of the alert itself.

Alerting

What to alert on	Example
Latency percentiles	P99 latency > 2s for payment flow across all users
Span error rates	Database span error rate > threshold
SLO burn rate	Critical path success rate burning error budget too fast

SLOs

SLO type	Example
Transaction success	99% of checkout flows complete successfully
Critical path latency	95% of payment transactions < 1s
End-to-end latency	90% of user journeys < 3s total

Dashboards

Dashboard type	What you see
Trace analysis	Span breakdown, latency distribution
Flame graphs	Where code spends time (profiling)
Frontend performance	Core Web Vitals, user experience metrics
AI/LLM metrics	Token usage, model latency, prompt analysis

Investigation

Tool	How you use it at Level 3
Trace Explorer	Search traces by attributes, find slow spans
Trace-to-logs	Jump from trace span to related logs
Trace-to-profiles	See code-level performance for a trace
Session replay	Watch what the user actually experienced

At Level 4, you’ll alert on custom metrics and KPIs.

At Level 3, your operational practices get much more precise — but alerting still works at the aggregate level, not on individual transactions.

Alerting on individual transactions would generate thousands or millions of alerts every time there’s a problem. A single user experiencing a slow request might just have a bad network connection — that’s not something that needs your attention.

What does need your attention is when your SLO is impacted. Imagine an alert that fires when p99 latency for the payment flow exceeds 2 seconds across your user base. Or when your critical path success rate drops below 99 percent. These aggregate signals tell you something is genuinely wrong.

That’s the pattern: SLOs and percentile alerts tell you there’s a problem. Traces give you the tools to investigate why.

Your SLOs at this level map directly to user experience: “99 percent of checkout flows complete successfully” or “95 percent of payment transactions complete in under 1 second.”

For investigation, you’ve got powerful tools: Trace Explorer for searching traces by attributes, trace-to-logs for jumping from a span to relevant log lines, trace-to-profiles for seeing code-level performance, and session replay for watching what the user actually experienced.

At Level 4, these same concepts apply to custom metrics, but we’ll get to that next.

Transaction-level operations

Alerting

SLOs

Dashboards

Investigation

Script

In this module