The state of observability in 2025: a deep dive on our third annual Observability Survey

• 2025-03-25 • 17 min

Across companies of all shapes and sizes, observability practices are maturing and getting attention at the highest levels. At the same time, cost and complexity continue to hinder efforts as teams look to emerging tools to help simplify their processes in hopes of better outcomes.

With so much in flux, we went into our third annual Observability Survey hoping to get a window into the ways the community is approaching observability and where it wants it to go next. And the response was overwhelming, more than quadrupling the total from last year’s survey and showing just how important observability is to SREs, developers, and even the C-suite.

Keep reading to learn how to get the most out of this year’s survey results, or skip ahead to dive right into the findings.

How to interact with the Observability Survey

In total, we collected 1,255 responses online and at industry events around the world, making it the largest community-driven survey on the state of observability. We also gathered a wide range of opinions, with input from people working at companies big and small, across more than a dozen industries, and in a range of roles that have different needs and perspectives.

In fact, we collected so many great responses that we decided to do things a little differently this year and give you multiple ways to glean the information:

The main report, which includes all the top-line stats and charts you need to quickly grasp what’s happening across the observability landscape today. You’ll find information on tools and techniques, the role of OSS, the challenges teams are facing, and the solutions they hope to adopt going forward.
A Grafana dashboard, which includes more than a dozen interactive panels to help visualize the stats and dig deeper. You can even filter by industry, region, role, and company size to see how your organization stacks up to its peers.
A series of YouTube videos we’ll be rolling out in the coming days featuring some of our internal experts sharing their thoughts on the survey results.
This blog, which will provide analysis on the key takeaways to help you understand what it all means and what comes next, including insights on:
- Market maturity and C-suite engagement
- The importance of OSS—and Prometheus and OpenTelemetry in particular
- All the complexity teams are juggling and where AI/ML may help
- The continuing focus on cost controls
- The role of SLOs, full stack observability, and other emerging solutions

And don’t worry, it’s all totally free; you don’t even have to hand over your email address. We’re grateful that so many people took the time to share their opinions, and we want to make this data accessible to everyone who cares about observability. So keep reading to go deeper on the state of observability today.

The observability space is maturing—and the C-suite is paying attention

If you want a sense for how observability practices are maturing, look no further than the telemetry being collected—and traces in particular. Organizations overwhelmingly rely on metrics (95%) and logs (87%), but those are also carryovers from legacy monitoring techniques. Organizations have been slower to take the leap into traces, which are often considered the third pillar of observability. Traces are critical for end-to-end visibility into today’s modern, distributed systems, so it’s encouraging to see more than half (57%) of all companies use them.

Chart showing adoption of metrics, logs, traces, and profiles

You can also gauge maturity by the ways organizations approach observability, and today’s companies are more than twice as likely to follow the most mature approach than the least mature one. (See the chart below for all “approach” definitions.)

Chart showing adotion of different approaches to observability

We tend to think of the “operations team” approach as the least mature since it isolates different groups and doesn’t ensure observability is factored into every stage of the application lifecycle. Conversely, we see “centralized observability (support)” as the most mature since it takes a single-pane-of-glass approach that’s also scalable across growing teams. When you compare the two, these centralized teams are more likely to use traces (63% vs. 50%). They’re also less likely to be concerned about too much noise (36% vs. 44%), complexity (29% vs. 49%), or having to convince management about the value of observability (18% vs. 28%).

Market awareness is on the rise

Another way to measure maturity is to look at larger shifts in market awareness. For example, emerging tools and techniques are typically used by early adopters more willing to take risks and gain deep expertise to get a competitive advantage. As the market expands, certain aspects become commoditized and late-stage adopters look to reap those same benefits via managed services. We’re starting to see that in observability, as 37% of organizations “mostly” or “only” use SaaS—a 42% increase year over year—rather than managing their own stacks.

Organizations that adopt the most mature form of centralized observability are also more likely to use SaaS, which follows a trend we’re seeing with our own users. And it makes sense, since those teams are no longer spending time operating a massive time series database or log aggregation system. Instead, they’re devoting that time to getting the most value from the technology and driving adoption throughout their organizations.

And there’s no bigger sign that a technology has hit a critical mass than when it lands on the C-suite’s radar. That’s what’s happening to observability, as “CTO/C-suite” was the most common response (33%) when we asked the highest level at which observability is considered critical to the business. That was followed by the director level (23%), leaving less than a third saying observability isn’t considered critical beyond individual teams (developers at 16% and observability teams/contributors at 12%).

Chart showing the highest levesl observability si considrered business-critical

Most organizations take a bottom-up approach to observability, with developers and SREs adopting tools to better understand the state of their systems. But the expansion of support to virtually all levels of these organizations appears to be having a positive impact on maturity, too. Companies whose C-suite sees observability as business-critical are more likely to adopt tools and practices such as traces, profiles, SLOs, OpenTelemetry, and unified application and infrastructure observability.

Open source is at the heart of today’s observability landscape

We’ve long believed that OSS would win out in observability. There are simply too many data types, too many data sources, and too many tools to not have open source and open standards underpin it all. And so far, the data is proving that to be true:

Roughly three-quarters (76%) of companies are using open source licensing* for observability, and they’re three times as likely to use an open source license exclusively compared to using a commercial license exclusively (30% vs. 8%).
71% are using both Prometheus and OpenTelemetry in some capacity†, and more than a third (34%) are using both in production in some capacity††.
At least half of all organizations increased their investments in Prometheus and OpenTelemetry—for the second year in a row.
Eight of the 10 most used observability technologies are open source.

There’s also some interesting data that points to the importance of open source, even if it’s not explicitly framed that way. For example, less than half of all respondents (41%) listed “based on open source software/technologies” as a top criteria for selecting new observability tools. However, respondents could select multiple criteria, including two other hallmarks of open source and open standards: “ease of switching to another tool in the future” (35%) and “interoperability with other tools used at my organization” (46%). In total, 59% cited at least one of these three as an important selection criteria.

Similarly, when we asked about requirements for an OpenTelemetry backend, 57% cited the ability to “ensure vendor-neutrality and flexibility,” and only 49% cited “compatibility with existing monitoring/observability systems (e.g., Prometheus).” But collectively, 75% cited at least one of these as an important criteria.

There’s even data that appears contradictory at first glance, but may actually point to an evolving view on the value of open source, in practical terms. For example, among those who said they’re using “mostly” or “only” commercial licenses for observability, 62% said they’re using Prometheus in production in some capacity††, and 56% are using OpenTelemetry in production in some capacity††. This shows that even when organizations don’t explicitly prioritize open source and open standards, it can still be a critical part of their observability practice.

Another example is those who only use SaaS for observability. You’d assume that group would mostly or exclusively operate under commercial licensing, and yet 35% of them said they’re only using open source licensing for observability. Though we can’t know for certain, it’s possible this group was referring to SaaS that is underpinned by OSS and therefore carries many of the inherited benefits, such as interoperability and lack of vendor lock-in.

The growing ubiquity of Prometheus and OpenTelemetry

Prometheus has been the industry standard for monitoring applications and services for years, and it’s still going strong today, with more than two-thirds (67%) of all companies using it in production in some capacity††.

OpenTelemetry is newer than Prometheus, but it’s quickly emerging as an essential framework for standardizing telemetry data. In fact, total usage (investigating, POC, in production, using extensively, or exclusively) for both OSS projects is relatively close, with 86% using Prometheus and 79% using OpenTelemetry†.

And neither one appears to be going anywhere, with only a small minority decreasing their investment in Prometheus (7%) or OpenTelemetry (5%) over the past year.

Charts showing adoption rates for OpenTelemetry and Prometheus

But a closer look at the data shows OpenTelemetry could be on the cusp of even more growth. For example, despite the similarities in overall interest, only 19% of Prometheus users are either investigating it or building POCs, while 38% of OpenTelemetry users are doing the same, showing that it’s still early days for the project. And while 21% said they aren’t using it currently or don’t know if their organization is using it, when we asked about requirements for an OpenTelemetry backend, only 6% said they have no plans to use OpenTelelemtry at all.

OpenTelemetry users also tend to use more mature observability practices. For example, they’re more likely to use a centralized observability (support) approach than those who don’t use OpenTelemetry (39% vs. 35%).

It will be interesting to see how this newer wave of users impacts the future direction of vendors’ OpenTelemetry solutions, as you can already see differences between those just starting out (investigating or building a POC) and those using it in production in some capacity (in production, using extensively, using exclusively). Those using OpenTelemetry in production put a higher premium on support for a variety of telemetry types (61% vs. 51%), compatibility with existing systems (56% vs. 51%), cost-effectiveness (49% vs. 41%), and scalability (44% vs 36%) for their compatible backend.

We’ll also be watching how long-time Prometheus users set their expectations as they make OpenTelemetry a bigger part of their observability strategy. Compared to non-Prometheus users, they’re far more likely to want an OpenTelemetry backend that’s compatible with existing systems for any (56% vs. 35%). This points to a real need for a bridge between the two as OpenTelemetry progresses toward more of an industry standard.

* Combines “open source only,” “mostly open source,” and “roughly equal” split between open source and commercial licensing
† Combines “We are investigating it,” “We are building a POC,” “We are using it in production,” We are using it extensively," and “We are using it exclusively”
†† Combines “We are using it in production,” “We are using it extensively,” and “We are using it exclusively”

Teams are juggling lots of complexity, and they’re betting on AI to help tame it

As we mentioned earlier, part of the reason open standards have become so essential is the vast number of tools and data sources being used today. In fact, when we asked respondents to list their observability technologies currently in use, they cited more than 100 different tools, services, and projects. Of course, the average number of tools in use is much smaller (eight), but it still represents a major challenge.

Chart showing adoption rates for different technologies in use

And then there’s all the data sources that have to be managed. For the second straight year, the average Grafana user has 16 data sources configured in the platform, though a little over half of all users have five or less.

Charts showing adoption rates for data sources configured in Grafana

We didn’t ask which specific data sources they used, so it’s possible the average was skewed by respondents who configure the same data type across multiple instances. Still, even though it wouldn’t be as complex as configuring dozens of unique data sources, the 25% of organizations that are managing double digit data sources (including 5% with more than 100 data sources!) likely have their hands full even if there’s overlap in data types.

It’s no wonder “complexity/overhead associated with setting up and maintaining tooling” was the most cited concern about observability in this year’s survey. It’s also no surprise that as your company grows, so does its complexity. For example, the smallest companies (10 or fewer employees) average four different observability technologies, while the largest companies (more than 5,000 employees) average 10. Similarly, the smallest companies have only six data sources configured in Grafana, compared to the 24 configured by the largest enterprises.

But despite the differences, the percentage of organizations concerned about complexity is fairly uniform across all company sizes, suggesting that it’s an issue at almost any scale.

Complexity challenges spills into other areas as well. Alert fatigue is the No. 1 obstacle to faster incident response, outpacing the next closest responses by an almost 2:1 margin.

A chart showing the biggest obstacles to faster incident response

It’s also reflected in the criteria organizations use when selecting new observability technology, with 57% saying “ease of use/learning curve for new users” was a top priority. Nearly a third also cited “familiarity/adoption with your organization” as an important criteria.

The role of AI in observability

So far, organizations aren’t leaning too hard into AI/ML as the solution to all this complexity, with just 19% citing it as an important selection criteria. However, a closer look at the organizations already managing a sizable observability footprint could portend changes on the horizon, as those operating at a larger scale are more inclined to want this functionality today. For example, 26% of companies with more than 5,000 employees prioritize AI/ML capabilities, as do 28% of organizations with more than 20 data sources.

Of course, as we’ve all seen, this is a rapidly evolving space. It’s worth noting that we began collecting survey responses last September, so it’s possible that attitudes have shifted since then and could shift even further when we ask the question next year. For example, perhaps the relatively low percentage of people that had AI/ML as part of their selection criteria can be tied back to skepticism about hype and getting stuck buying vaporware.

But as this space continues to advance, you can expect it to play a larger role in the tools on the market and in users’ expectations. That’s why we asked respondents which AI/ML “wishlist” features would benefit their observability practice the most, and it should come as no surprise that the top two requests—training-based alerts and faster root cause analysis—tie back to the complexity issues we just discussed.

Chart showing AI/ML wishlist prefernences

And while those two items were almost universally ranked as the top two across every demographic, the exact ranking varied by a number of factors. Smaller organizations, those using SaaS, and those with fewer observability technologies tend to favor training-based alerts by a fairly large margin, while larger organizations, those using self-managed setups, and those with more technologies tend to favor faster root cause analysis.

Cost controls remain a priority, but the focus is shifting to value

Cost is always top of mind for businesses, especially these days, so it isn’t a surprise that three-quarters of companies say cost is an important criteria when selecting observability technologies. There’s also a strong relationship between cost concerns and cost as a selection criteria, as 88% of those concerned that observability costs too much are prioritizing cost in their selection, and 85% of those who say costs are too difficult to predict and budget for are prioritizing it in their selections.

There’s also the question of how much you should spend on observability relative to the rest of your infrastructure. There’s no industry standard for what percentage that should translate to, and that was clear from the survey responses. The average across all organizations was 17%‡, though some respondents say it’s 0% (presumably because they’re using OSS tooling, though that doesn’t factor in overhead costs), while others said it’s upwards of 50%. However, the median and mode both came in at 10%, so perhaps that’s a benchmark to keep in mind going forward.

In terms of biggest concerns, there wasn’t a clear frontrunner. Only a minority of respondents explicitly cited cost—either too high (37%) or too unpredictable (29%)—but there are other expenses to consider beyond your monthly observability bill. In fact, you could argue that the most commonly cited concerns (complexity/overhead and signal-to-noise ratio) carry their own hidden costs.

Managing a complex system on your own can translate to lots of engineering hours at scale, which quickly gets expensive. And the signal-to-noise problem can be a byproduct of collecting too much data, which can be especially problematic if you use a vendor that charges based on telemetry ingestion. But the underlying fear associated with both of these concerns is the potential for outages and an inability to respond promptly—a scenario that could have much larger financial consequences than any observability bill.

Moreover, cost savings was the least important outcome for organizations using or interested in service level objectives (SLOs), with teams instead focusing on MTTR, accountability, and reduced alert nose. Taken collectively, these stats indicate that organizations are focused on getting value from their tools and techniques rather than just hunting for the cheapest option.

A chart showing respondents' biggest observability concerns

In fact, the percentage of respondents who cited “convincing management of the value of observability” as a top concern fell year over year (28% vs. 23%). That makes sense when you consider that three-quarters of all companies say observability is business-critical at either the CTO, VP, or director level, with CTO being the most common response (33%).

‡ This was an optional, open-ended question. Inconsistent or inaccurate responses were removed from the dataset, leaving a base of 294 responses.

SLOs and other emerging tools and techniques are starting to take hold

One of the best ways to combat cost and complexity concerns is through SLOs, which establish measurable goals related to the quality of service provided to users. Though not entirely new, they haven’t quite cemented themselves in the observability ethos, in part because it’s as much about cultural change as it is about the actual technology.

Still, nearly three-quarters (73%) of all organizations are actively investigating or using SLOs today, and adoption rates are higher among those using more mature tools and techniques, including traces, profiles, and centralized observability, as well as those juggling more observability technologies and data sources.

Adoption varies by role, with SREs (29%) much more likely to say their organization is using them in production in some capacity (in production, using extensively, using exclusively) than developers (18%). There’s also varying degrees of interest at the managerial level, with 32% of engineering directors saying their organization uses them, compared to just 14% of CTOs.

In terms of what organizations hope to get out of SLOs, the most common response was reduced MTTR (33%), followed by better accountability (25%), reduced alert noise (16%), and cost savings (14%)

Full-stack observability, FinOps, and LLM observability

Another emerging area that’s getting even more attention today is unified application and infrastructure observability, with 85% of all organizations either using or looking into it to get visibility into their entire software stack. They’re especially popular with companies moving beyond just logs and metrics, with 45% of those using profiles also using full-stack observability in production in some capacity (in production, using extensively, using exclusively), as well as 42% of those using traces, compared to just 34% across all organizations.

Unified application and infrastructure observability adoption

We also asked about two other emerging areas: LLM observability, and FinOps. More than half of all organizations are either looking into or using both, but neither is seeing a ton of use in production in any capacity: 7% for LLM observability, and 15% for FinOps.

Methodology

A total of 1,255 observability practitioners and leaders around the world participated in our third annual Observability Survey between Sept. 18, 2024, and Jan. 2, 2025. We developed the questions internally and promoted the survey online through our blog, website, social media channels, and newsletters, and through the help of our Grafana Champions. Our Events and Community teams also collected responses in-person at ObservabilityCON 2024 and ObservabilityCON on the Road, as well as third-party events like AWS re:Invent, KubeCon EU, KubeCon India, and local Meetups.

The data analysis was conducted with Censuswide. Censuswide abides by and employs members of the Market Research Society and follows the MRS code of conduct and ESOMAR principles. Censuswide is also a member of the British Polling Council.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!