Transaction-level operations
At Level 3, alerting works at the aggregate level — percentiles, error rates, and SLOs. Traces are for investigation after an alert fires, not the source of the alert itself.
Alerting
| What to alert on | Example |
|---|---|
| Latency percentiles | P99 latency > 2s for payment flow across all users |
| Span error rates | Database span error rate > threshold |
| SLO burn rate | Critical path success rate burning error budget too fast |
SLOs
| SLO type | Example |
|---|---|
| Transaction success | 99% of checkout flows complete successfully |
| Critical path latency | 95% of payment transactions < 1s |
| End-to-end latency | 90% of user journeys < 3s total |
Dashboards
| Dashboard type | What you see |
|---|---|
| Trace analysis | Span breakdown, latency distribution |
| Flame graphs | Where code spends time (profiling) |
| Frontend performance | Core Web Vitals, user experience metrics |
| AI/LLM metrics | Token usage, model latency, prompt analysis |
Investigation
| Tool | How you use it at Level 3 |
|---|---|
| Trace Explorer | Search traces by attributes, find slow spans |
| Trace-to-logs | Jump from trace span to related logs |
| Trace-to-profiles | See code-level performance for a trace |
| Session replay | Watch what the user actually experienced |
At Level 4, you’ll alert on custom metrics and KPIs.
Script
At Level 3, your operational practices get much more precise — but alerting still works at the aggregate level, not on individual transactions.
Alerting on individual transactions would generate thousands or millions of alerts every time there’s a problem. A single user experiencing a slow request might just have a bad network connection — that’s not something that needs your attention.
What does need your attention is when your SLO is impacted. Imagine an alert that fires when p99 latency for the payment flow exceeds 2 seconds across your user base. Or when your critical path success rate drops below 99 percent. These aggregate signals tell you something is genuinely wrong.
That’s the pattern: SLOs and percentile alerts tell you there’s a problem. Traces give you the tools to investigate why.
Your SLOs at this level map directly to user experience: “99 percent of checkout flows complete successfully” or “95 percent of payment transactions complete in under 1 second.”
For investigation, you’ve got powerful tools: Trace Explorer for searching traces by attributes, trace-to-logs for jumping from a span to relevant log lines, trace-to-profiles for seeing code-level performance, and session replay for watching what the user actually experienced.
At Level 4, these same concepts apply to custom metrics, but we’ll get to that next.
