Slide 7 of 8

Operations at Level 3

Transaction-level operations

At Level 3, alerting works at the aggregate level — percentiles, error rates, and SLOs. Traces are for investigation after an alert fires, not the source of the alert itself.

Alerting

What to alert onExample
Latency percentilesP99 latency > 2s for payment flow across all users
Span error ratesDatabase span error rate > threshold
SLO burn rateCritical path success rate burning error budget too fast

SLOs

SLO typeExample
Transaction success99% of checkout flows complete successfully
Critical path latency95% of payment transactions < 1s
End-to-end latency90% of user journeys < 3s total

Dashboards

Dashboard typeWhat you see
Trace analysisSpan breakdown, latency distribution
Flame graphsWhere code spends time (profiling)
Frontend performanceCore Web Vitals, user experience metrics
AI/LLM metricsToken usage, model latency, prompt analysis

Investigation

ToolHow you use it at Level 3
Trace ExplorerSearch traces by attributes, find slow spans
Trace-to-logsJump from trace span to related logs
Trace-to-profilesSee code-level performance for a trace
Session replayWatch what the user actually experienced

At Level 4, you’ll alert on custom metrics and KPIs.

Script

At Level 3, your operational practices get much more precise — but alerting still works at the aggregate level, not on individual transactions.

Alerting on individual transactions would generate thousands or millions of alerts every time there’s a problem. A single user experiencing a slow request might just have a bad network connection — that’s not something that needs your attention.

What does need your attention is when your SLO is impacted. Imagine an alert that fires when p99 latency for the payment flow exceeds 2 seconds across your user base. Or when your critical path success rate drops below 99 percent. These aggregate signals tell you something is genuinely wrong.

That’s the pattern: SLOs and percentile alerts tell you there’s a problem. Traces give you the tools to investigate why.

Your SLOs at this level map directly to user experience: “99 percent of checkout flows complete successfully” or “95 percent of payment transactions complete in under 1 second.”

For investigation, you’ve got powerful tools: Trace Explorer for searching traces by attributes, trace-to-logs for jumping from a span to relevant log lines, trace-to-profiles for seeing code-level performance, and session replay for watching what the user actually experienced.

At Level 4, these same concepts apply to custom metrics, but we’ll get to that next.