Grafana Labs Co-founder Woods: Market maturity, OpenTelemetry, and AI are reshaping observability

• 2025-09-26 • 9 min

As organizations navigate increasingly complex tech environments, unified observability practices have become essential.

That was one of the main takeaways from Grafana Labs Co-founder Anthony Woods’ recent appearance on “Tech Keys by by Mercari India,” a podcast hosted by Vaibhav Khurana, Head of Platform Engineering at Mercari India.

“There’s a lot of what we call ’technology debt,’ which is where you’ve got a whole bunch of different tools and different teams using it that you need to get on top of, and really we see that the solution for that is these unified observability platforms, a place where you can kind of bring all of your data together in one place,” Woods said.

The pair discuss emerging trends in observability and critical insights from our 3rd annual Observability Survey. Khurana, a Grafana Cloud user, is actively setting up a dedicated central observability team, aligning with industry-leading practices highlighted during the conversation.

Check out the full episode below, or continue reading to get the key highlights from their discussion, including Woods’ thoughts on the evolution of observability, the importance of OpenTelemetry, and the role of AI going forward.

Monitoring vs. observability

Khurana: When do you actually think, from your perspective, monitoring transitioned into observability, and what do you think is the difference between monitoring and observability?

Woods: One of the big drivers has been the shift to microservices. There’s increasing complexity in the software where it’s no longer good enough just to know whether it’s up. You actually need to have the telemetry data, not just to tell you when things are broken, but really to be able to provide you insights and understanding about how your system is working so that you can troubleshoot it.

We are big believers that there’s value in all the telemetry. We talk about the three pillars of observability—your metrics, logs, and traces—and then we are very excited about continuous profiling being the next pillar, but we think there’s value in all of them. So metrics give you that high-level overview of what’s happening—just at a quick view so you can see where there are trends and changes in your environment.

And then you often need more detail. Logs are easy, we’ve had them for generations, and then we see the distributor tracing comes in, which really helps with this. How does the request flow through all of these microservices? That gives you a lot of insights.

Tackling data overload, cost, and value

Vaibhav: One of the results of the Observability Survey that caught my eye is the trend of executive buy-in, or that execs are also interested in getting observability systems for their organizations. Earlier it was mainly at the engineer levels or maybe at the tech lead level or directors. But CXOs are coming into the picture now. What do you think about that?

Woods: That’s definitely the experience that we’re seeing engaging with our customers in the industry. One of the big reasons for that is we talk internally around this crossing the chasm, moving from early adopters of this technology and into the mainstream.

The early adopters, they’re your practitioners that love this technology. They’re going to go out looking for what’s new and shiny out in the industry and try to bring it into their organizations. And then we have other organizations that have a different level of resources available to invest in this kind of technology. They’re kind of out in the fringes waiting to see what sticks and what works. They’ve seen the success that these early adopters have had in adopting new observability practices and the value that they’re getting out of it. They’re realizing they want that value, but those organizations are very different in how they adopt technology. They are more top-down.

Khurana: Whenever we are buying anything, we are spending money on top of it. In other words: how much do I have to pay for it and whether it’s worth paying for the software or not. Why can’t I run it on my system? Why can’t it be just a file of logs and a grip on top of it? Why do I pay for Grafana or any other system for that matter?

Woods: We’ve certainly seen trends in the industry recently where there’s a big gap between cost and value. That’s definitely top of mind. That’s certainly a lot of the things that came out of our Observability Survey: people are worried about cost and how they get value without the costs exploding.

We want to make our open source as good as it can be, and there’s a huge amount of value in that and the cost is very low—certainly not zero because you have to have your engineering resources to go and focus on it. But we think for us, we are in the business of delivering value. When we talk to our customers, cost is a thing, but we want to focus on value. Are you getting value from your observability strategy? Is it helping? And that’s really the most important thing.

We’re seeing this trend certainly in the industry with this change in the way we build software and this move into microservices where tools that you might’ve been using in the past don’t make sense in this new model. The cost profile of them is very different.

So when we think about observability, it is a journey. Every organization is on that journey. They’re just at a different stage along the path. And certainly we see when you’re early in that observability journey, you don’t know what data you’re going to need.

SLO-driven signals and OpenTelemetry

Khurana: Can you share any insights on new changes Grafana Labs is working on, or overall changes in the industry that you’re seeing to cater to alert fatigue?

Woods: Service-level objectives (SLOs) is the big one. The other one is just maturity in your incident response processes. So this is another thing that came out of the observability survey and it’s actually something more specific to APAC, where we see a much higher percentage of people who see a lack of well-defined incident response processes as being a big hindrance to improving their observability.

When things go wrong, you want to actually fix them. This is the whole kind of SRE model where it’s not just about stemming the bleeding and saying, “Oh, I restarted the service, it’s good to go now,” because it’s probably going to fail again and you’ll have to restart it. You actually want to actually spend the time to go and understand what was the actual root cause, and then go and fix that problem where it actually exists—rather than just constantly putting band-aids on things.

Khurana: One thing that I’ve also faced at a lot of times is my service is working well and good. My dashboards are fine, but one of the dependent services is now managed by some other team, and they don’t have the same way of dashboarding or they have different dashboards or no dashboards at all depending on the majority of that product. How are tool and other things solving this kind of picture?

Woods: That’s another thing from the Observability Survey, where we see there’s a lot of what we call “technology debt,” which is where you’ve got a whole bunch of different tools and different teams using it that you need to get on top of, and really we see that the solution for that is really these unified observability platforms, a place where you can kind of bring all of your data together in one place—the pipe dream.

On observability specifically, we’re seeing this great adoption of open standards and vendor agnostic tooling, which I think is really important.

People are going through this pain right now where they’ve got these legacy monitoring tools and they’re like, “It’s not working.” The cost profile is way worse, it doesn’t work with how we build our software now, but the migration cost is very expensive. We are very excited about OpenTelemetry as an ecosystem. This vendor-agnostic open standards approach where instrumenting your applications—it’s a lot of work. You should only have to do it once and you should have the freedom to be able to then go and decide which vendor you’re going to give that data to, and you should be able to have flexibility to change that in future without having to go and instrument everything.

73% of organizations are looking at OpenTelemetry or using it and open standards as well as the Prometheus ecosystem as well. We see that obviously still being heavily used and we see a lot of organizations that are using both and that’s great. This whole value proposition around open standards and not being locked into a specific vendor is really important, especially when you’re making a big strategic long-term decision about where you want your observability to go and what your strategy is. It makes a lot of sense to come up with a strategy where you’ve got choice and flexibility to make changes over time.

AI and observability

Woods: We are getting into the realm of generative AI where it’s really kind of looking at that data and understanding what it means and being able to make recommendations.

Khurana: What’s your take on AI?

Woods: Obviously it’s changing the world. I think it’s going to be very impactful. I’m really excited about where things are going. If you’d asked me maybe 10 months ago, I would’ve been a skeptic. But we’ve actually had some really great innovations happening inside our organization with some product capabilities we’ve had and useful capabilities coming out of AI.

We announced our Grafana Assistant at GrafanaCON, and this just blew us away with how well it worked. The whole model is built into an agentic model tightly integrated into our Grafana Cloud environment. It understands the context of where it knows what dashboard you’re looking at, it knows what our technology stack looks like. It’s taking that engineering approach of looking at data based on the response, knowing where to go, and looking next to go and find those root causes of the problems.

Instead of engineers having to go and explore, we can leverage the generated AI to do that exploration and just come and tell us what we need to know. A great example is that when an incident happens, you might get your slow burn. An SLO alert will fire, your on-call engineer will say, “OK, there’s an incident,” and they’ll declare an incident and by the time they log into Grafana to look at it, there will be a summary of investigation that’s already happened and what the AI thinks is where the root cause is.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Grafana Labs Co-founder Woods: Market maturity, OpenTelemetry, and AI are reshaping observability

Monitoring vs. observability

Tackling data overload, cost, and value

SLO-driven signals and OpenTelemetry

AI and observability

Related content

How to monitor AI agent applications on Amazon Bedrock AgentCore with Grafana Cloud

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

Baking in site reliability with observability and AI: How SpotOn uses Grafana Assistant to keep...