Menu
Grafana Cloud

Troubleshoot a performance issue

This guide walks you through a common end-to-end workflow for using Adaptive Profiles to automatically detect, diagnose, and resolve an intermittent performance issue in one of your services.

Goal

In this scenario, you use Adaptive Profiles to:

  • Automatically capture a high-resolution profile during a performance issue.
  • Use an auto-generated insight to pinpoint the root cause of the issue.
  • Use Drilldown to view a detailed flame graph, validate the finding, and resolve the problem.

Solution

You are responsible for a critical service that has been experiencing intermittent latency spikes. The issue is difficult to diagnose because it’s hard to predict when it will happen, and running a continuous high-resolution profile is too expensive.

Step 1 (automatic): an event is detected

You don’t have to do anything for this step. While your service runs, Adaptive Profiles continuously analyzes its performance at a cost-effective baseline resolution.

Suddenly, the system’s automatic analysis detects a spike in CPU usage in your service’s segment. In response, Adaptive Profiles automatically increases the profiling resolution to capture high-detail data for the next 10 minutes. This ensures that the critical moment of the performance issue is captured without you needing to intervene manually.

Step 2 (automatic): an insight is generated

Once the high-resolution profiling period is complete, the detailed data is captured.

The system analyzes the profiles and identifies a specific, problematic function in your code that is consuming the most resources. Because a valuable, actionable root cause was found, the system generates an insight, which appears in your Insights overview.

Step 3: investigate the insight

Now it’s your turn to act. Navigate to the insight in Grafana.

  1. Open the insight.

    You immediately see a summary and recommended improvements that point directly to the problematic function.

  2. Analyze the flame graph to understand the call stack and see exactly where the application is spending its time.

  3. From here, click Go to Drilldown to access a more detailed and interactive view of the flame graph for deeper investigation.

Step 4: resolve the issue

By analyzing the flame graph, you confirm that an inefficient loop within the identified function is the cause of the latency. With this precise information, you can now make a targeted code change to fix the performance issue.

Once you have addressed the problem, you can dismiss the insight, which moves it to your Insights History for future reference.

Outcome

You have successfully used Adaptive Profiles to resolve a complex performance issue with minimal effort.

  • You saved on costs by only paying for high-resolution data during the critical event.
  • You reduced your Mean Time to Resolution (MTTR) from hours of manual searching to just minutes of targeted investigation.
  • You were proactive, letting the system automatically detect and flag the issue for you.