Note
Anomaly detection is currently in public preview. Grafana Labs offers limited support, and breaking changes might occur prior to the feature being made generally available.
Investigate anomalies
Adaptive Traces monitors your services for anomalies. When the system detects an anomaly, it temporarily samples relevant traces and surfaces them, allowing you to investigate problems you might not have known to look for.
How it works
- The system analyzes your past data to understand what “normal” looks like. For example, it learns the typical latency for a specific database query.
- It then monitors new, incoming traces for significant changes from that established baseline, like a sudden spike in latency.
- When a trace is identified as an anomaly (or an outlier), it is retained and not dropped by normal sampling. This allows you to investigate rare or intermittent issues that might otherwise be lost.
- A temporary sampling policy to capture relevant traces is created.
Your workflow
Anomaly detection integrates into your Adaptive Traces workflow to speed up troubleshooting.
Automatic monitoring and control
Anomaly detection is enabled by default, so you get the benefit of automatic monitoring without any initial setup. The system is always on, analyzing your trace data in the background to find unusual patterns. However, you can it turn off on the Overview page.
Anomaly detection and investigation
When the system detects an anomaly—like a sudden spike in latency for a specific service—it appears in the UI as an auto-applied recommendation on the Overview page.
As long as it is still valid and hasn’t expired, it’s also listed on the Policies page.
Your first action is to expand the anomaly you want to examine. This takes you from a high-level overview to a more detailed view of what happened and which services were impacted.
Drilldown and analysis
From the investigation view, you can click Drilldown to instantly jump to the specific traces related to that anomaly within the exact time range the issue occurred. Instead of manually searching through thousands of traces, you are taken directly to the handful of examples that show the problematic behavior, allowing you to quickly find the root cause.
Review and investigate
Anomalies automatically detect unusual behavior in your trace data. Move from a high-level, automatically-detected anomaly directly to the specific traces needed to troubleshoot the issue, significantly reducing the time required to find the root cause and resolve issues faster.
Complete the following steps to investigate anomalies.
Navigate to the Recommendations History on the Overview page to view any auto-applied recommendations.
They are highlighted as auto-applied anomalies in the Recommendations History or in the Policies list as Anomaly detected.
Expand the anomaly you want to examine.
It shows what the system detected, where it occurred (service and span), when it happened (timestamps), along with details such as threshold, type, and expiration.
From here you can edit or delete the policy, or drill down into the traces.
To analyze the raw data, click Drilldown.
A pre-filtered view opens that shows only the traces related to the anomaly, scoped to the exact time range when it occurred. Here you can move from high-level anomaly detection to the exact traces needed to investigate the root cause.
From Drilldown, you can:
See anomaly patterns in charts such as span rate, error rate, and duration.
Group and filter traces by attributes (for example, service name, cluster, or environment).
Open the Traces tab to examine individual traces in detail.
Deactivate anomaly detection
To deactivate anomaly detection, uncheck the Anomaly recommendations are enabled checkbox on the Overview page.