Determine your use case
Before you start investigating, identify your use case to choose the right approach and metric type.
Your use case determines which RED metric you start with and how you navigate through your tracing data. You might know exactly what’s wrong, or you might need to explore to find issues.
Why this concept matters
Identifying your use case helps you start your investigation efficiently. It guides you to the right RED metric and workflow, saving time and helping you find root causes faster.
Grafana Traces Drilldown supports three main types of investigations: error investigation, performance analysis, and activity monitoring. Each use case has a different starting point and workflow.
How it works
Each use case maps to a specific RED metric and investigation workflow. Your investigation goal determines which metric you start with and which tabs and views are most useful.
Error investigation uses the Errors metric to find failed requests and their root causes. Performance analysis uses the Duration metric to identify slow operations and latency bottlenecks. Activity monitoring uses the Rate metric to understand service communication patterns and request flows.
Traces Drilldown adapts its interface based on your selected metric. When you choose Errors, you see error-specific tabs like Exceptions and Root cause errors. When you choose Duration, you see latency-focused tabs like Root cause latency and Slow traces. When you choose Rate, you see Service structure to visualize service communication.
Use case 1: Investigate errors
Use this when you know requests are failing or you’ve seen error alerts.
You might have noticed:
- Error alerts from your monitoring system
- Failed requests in your application logs
- User reports of errors or failed operations
- Spikes in error rates on dashboards
How to start
- Select Errors as your metric type
- Start with Root spans to see service-level error patterns
- Use the Comparison tab to identify which attributes correlate with errors
- Use the Breakdown tab to see which services or operations have the most errors
- Use the Exceptions tab to find common error messages
- Use Root cause errors to see the error chain structure
When to switch to All spans: If you need to find errors deeper in the call chain, like database errors or downstream service failures that don’t appear at the root level, switch to All spans.
Example scenarios
You know a service is failing but not why:
- Select Errors metric and Root spans
- Filter by the service name
- Use Comparison to see which attributes differ between successful and failed requests
- Use Root cause errors to see the error chain structure
You see error alerts but don’t know the source:
- Select Errors metric and Root spans
- Use Breakdown to see which services have the most errors
- Drill into the problematic service using filters
- Use Comparison to identify what’s different about the failing requests
You need to find internal errors:
- Start with Errors metric and Root spans to see service-level patterns
- If errors don’t appear at the root level, switch to All spans
- This reveals database errors, downstream service failures, or internal operation errors
- Use Exceptions to find common error messages
Use case 2: Analyze performance
Use this when you want to identify slow operations, latency bottlenecks, or optimize response times.
You might be investigating:
- Slow response times reported by users
- High latency alerts
- Performance degradation over time
- Need to optimize specific operations
How to start
- Select Duration as your metric type
- Start with Root spans for end-to-end request latency
- Use the duration heatmap to identify latency patterns
- Select percentiles (p90, p95, p99) based on your SLA requirements
- Use Root cause latency to see which operations are slowest
- Use Slow traces to examine individual slow requests
- Use Breakdown to see duration by different attributes like service, environment, or region
When to switch to All spans: If you need to find slow internal operations like database queries or background jobs that don’t appear at the root level, switch to All spans.
Example scenarios
Users report slow responses:
- Select Duration metric and Root spans
- Look at the heatmap for latency spikes
- Use Root cause latency to see which service operations are causing delays
- Use Slow traces to examine individual slow requests
You want to optimize a specific endpoint:
- Select Duration metric and Root spans
- Add filters for the endpoint
- Use Breakdown to see duration by different attributes like service, environment, or region
- Select appropriate percentiles (p90, p95, p99) based on your optimization goals
You need to find slow database queries:
- Select Duration metric and All spans (database queries appear as child spans)
- Filter by database-related attributes
- Use Breakdown to see which queries are slowest
- Examine the slowest spans in Slow traces to identify problematic queries
Use case 3: Monitor activity
Use this when you want to understand service communication patterns, request flows, or overall system activity.
You might want to:
- Understand how services communicate
- Monitor request rates and patterns
- Identify unusual activity spikes
- Map service dependencies
How to start
- Select Rate as your metric type
- Start with Root spans for service-level request rates
- Use Service structure to visualize service-to-service communication
- Use Breakdown to see request rates by different attributes
- Use Comparison to identify unusual patterns compared to baseline
- Use Traces tab to examine individual requests
When to switch to All spans: If you need to see internal operations or child spans within traces, switch to All spans. Most activity monitoring use cases work well with Root spans.
Example scenarios
You want to understand service dependencies:
- Select Rate metric and Root spans
- Use Service structure to see how services call each other
- Identify the communication patterns and dependencies
- Use Traces to examine individual request flows
You notice unusual activity spikes:
- Select Rate metric and Root spans
- Use Breakdown to see which services or operations have increased rates
- Use Comparison to compare against normal baseline behavior
- Switch to Errors or Duration if the spike indicates problems
You’re doing capacity planning:
- Select Rate metric and Root spans
- Use Breakdown by service, environment, or region
- Understand request distribution patterns
- Use Service structure to see communication volumes between services
Choose your starting point
Your starting point depends on what you already know:
You know what’s wrong:
- Errors present → Start with Errors metric and Root spans
- Performance issues → Start with Duration metric and Root spans
- Specific service affected → Add a filter for that service first, then select the appropriate metric
You need to explore:
- Start with Rate metric and Root spans to get an overview
- Look for unusual patterns in the graphs
- Switch to Errors or Duration based on what you find
You’re doing proactive analysis:
- Start with Rate metric and Root spans to understand normal patterns
- Use Comparison to identify deviations from baseline
- Switch to Errors or Duration when you find areas of concern
Related concepts
- RED metrics - Understanding Rate, Errors, and Duration metrics
- Traces and spans - How traces and spans work in distributed systems
Related tasks
After you’ve determined your use case:
- Choose a RED metric to match your investigation goal
- Choose root or full span data based on the depth you need
- Analyze tracing data using the appropriate tabs for your metric type
- Add filters to refine your investigation as you discover patterns



