Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

Grot cannot remember your choice unless you click the consent notice at the bottom.

Improved anomaly detection and faster root cause analysis: the latest features in Grafana Cloud Application Observability

Improved anomaly detection and faster root cause analysis: the latest features in Grafana Cloud Application Observability

13 Jun, 2024 5 min

In recent years, “the biggest needs we’ve heard from our customers have been to make it easier to understand their observability data, to extend observability into the application layer, and to get deeper, contextualized analytics,” said Tom Wilkie, CTO of Grafana Labs, at ObservabilityCON 2023.

In response to that, last year, we introduced Grafana Cloud Application Observability, an opinionated, out-of-the-box solution designed to improve the reliability of modern applications. Featuring native support for both OpenTelemetry and Prometheus, Application Observability in Grafana Cloud helps developers and SREs seamlessly unify application and infrastructure insights — an essential capability to accelerate anomaly detection and root cause analysis, reduce MTTR, and advance your overall observability strategy.

Since we announced the general availability of Application Observability last fall, we’ve been hard at work, developing new features to further enhance the user experience and enable Grafana Cloud users to gain deeper insights into application performance.

Here’s a look at some of the ways you can work with telemetry signals in Application Observability to improve anomaly detection and better understand the behavior of your services.

Note: To see a demo of the latest features in Application Observability, you can check out the YouTube video below.

Analyze performance over time with time frame comparison

When dealing with data, and especially with data over time, there is always the nagging question: “Is what I am seeing normal, or an outlier?”

To help answer this question, we’ve added time frame comparison, an enhancement to the Application Observability user experience that allows you to compare the metrics for a certain service over time periods, such as today vs. yesterday or current month vs. previous months.

Imagine this: You have a big promotion in your e-commerce business and want to understand if the rate of requests to the productcatalog service is still within parameters. To do this in Grafana, you would manually create a panel that uses two different lookback periods. In Application Observability, you can simply toggle the comparison checkbox and to see a band of expected values in comparison to current values. This way, you can rest assured that your services are still performing.

A gif showing the time frame comparison feature.

Identify anomalies with automatic baselining

But wait — what about seasonality? To continue with our example above, if you’re an e-commerce business, your busiest months are likely in the second half of the year. In order to account for this seasonality, and ensure apples-to-apples comparisons, we introduced automatic baselining as a way to compare the current timeframe with a baseline in the time window comparison.

Automatic baselining uses standard deviation to help you understand if what you’re seeing is a normal pattern, or deviates from the expected calculation, so you can start troubleshooting quickly.

A gif showing the automatic baseline feature.

Narrow down problem dimensions with group-by and filter-by

Once you discover an anomaly, either through an alert or the user interface, the next step is to narrow down to the problem: is there a variable that could explain the anomaly?

In Grafana Cloud Application Observability, you can explore how different attributes and their values influence application performance. These attributes are characteristics that are native to your application, such as its deployment coordinates or domain-specific attributes like a department name or geographic location.

Let’s assume you found an anomaly through the baseline comparison and you want to analyze why. You can now use the group-by feature to break down the panel by attributes like k8s.cluster.name and get rate, errors, and duration for every single cluster that hosts your workload. This can help you understand if there is one specific location experiencing poor performance compared to the others.

A gif showing the group by feature.

Once you’ve found the outlier, you can then use the filter-by feature to manage which data is visible based on attribute values. For example, if you grouped data by geographical region and identified that errors are occurring only in the Europe region, you can then filter the data to visualize only the Europe geographical region. You can then repeat that step to further segment data and identify the issue.

A gif showing the fliter by feature.

Faster root cause analysis with in-context navigation

When seeing patterns on panels, you often want to dig in further, right? Well, now you can in Application Observability. Thanks to Grafana data links, it’s now possible to navigate from panels to traces and logs at a specific point in time.

Let’s walk through another example: imagine you see a spike in duration for service transactions. You want to determine which transactions are taking so long to process, and why that’s the case. By simply clicking on the graph, you can navigate directly to the traces of those transactions within your specified timeframe and filters. This accelerates root cause analysis by quickly pinpointing the problematic traces.

A gif showing in context navigation.

Get started with Application Observability

If you haven’t tried Application Observability in Grafana Cloud, it’s easy to get started with the following steps:

  1. Opt in for Grafana Cloud metrics generation, if it is not already enabled.
  2. Instrument your application using OpenTelemetry.
  3. Use Grafana Alloy with the OpenTelemetry (OTLP) integration (recommended) or the OpenTelemetry Collector to send telemetry data to Grafana Cloud.

For full implementation details and best practices, you can also reference our Application Observability documentation.

Grafana Cloud is the easiest way to get started with application observability. We have a generous forever-free tier that includes 2232 host hours per month and more. Sign up for free now!