AI in observability at Grafana Labs: Making observability easy and accessible for everyone

• 2025-08-14 • 9 min

Did you know that observability has been around for more than six decades? It all goes back to a Hungarian-American inventor named Rudolf Kálmán who thought about how external outputs could measure the internal state of a machine.

Kálmán wrote about monitoring single-input single-output systems, but our demands are very different today. We need to observe monoliths, microservices, clusters, pods, regions, and many more. And we collect metrics, logs, traces, and profiles to better understand what’s going on in our stacks.

Yes, a lot has changed over the years, but even bigger changes may be on the horizon thanks to AI. In fact, we’ve come to believe that AI is proving to be more than just a trend in observability: It will likely be integral to how organizations keep their systems fully operational going forward.

In this blog, we’ll share more about our perspective on the role of AI in observability and how that’s influencing the tools we’re building to help our users.

How we’re building AI-based observability solutions for different audiences

Large organizations often collect petabytes of information every month, which can create a real challenge for the engineers that have to sift through all that data, especially if they’ve just been alerted to an incident they need to work through. That’s why AI and observability are a perfect match—it’s all about data.

If you look at a typical observability workflow, it’s clearly not a singular task. It’s actually lots of different tasks (working through an incident, doing routine check-ups, building dashboards, etc.) that are chained together to produce an outcome. Some of these tasks are suitable for AI, while others are better left to people. This is because some tasks are easily automatable, while others are highly ambiguous. For those ambiguous tasks, AI can only take on a support role today—not an executing one.

It’s also important to understand that every user has different needs. AI allows us to deliver a more personalized experience than ever, so we want to double down on delivering experiences that matter to you.

When we put this all this together, we think about AI in observability across four fields:

Operators: Operators use Grafana mainly to manage their stacks. This also includes people who use Grafana outside the typical scope of observability (for general business intelligence topics, personal hobbies, etc.).
Developers: Developers use Grafana on a technical level. They instrument applications, send data to Grafana, and check traces. They might also check profiles to improve their applications and stacks.
Interactive: For us, “interactive” means that a user triggers an action, which then allows AI to jump in and provide assistance.
Proactive: In this case, AI is triggered by events (like a change to the codebase) or periodic occurrences (like once-a-day events).

These dimensions of course overlap. For example, users can be operators and developers if they use different parts of Grafana for different things. The same goes for interactive and proactive workflows—they can intertwine with each other, and some AI features might have interactive and proactive triggers.

Ultimately, these dimensions help us target different experiences within Grafana. For example, we put our desired outcomes into a matrix that includes those dimensions (like the one below), and we use that as a guide to build features that cater to different audiences.

Flowchart with two columns: Operators and Developers. Lists tasks for each, divided into Interactive and Proactive triggers.

When applying this model, it’s also important to keep in mind that transitions between these experiences are fluid. As we just discussed, there is overlap, so some features might even touch on all four dimensions.

Open source and AI is a super power

Grafana is an open source project that has evolved significantly over time—just like many of our other open source offerings. Our work, processes, and the public contributions in our forums and in our GitHub repositories are available to anyone.

And since AI needs data to train on, Grafana and our other OSS projects have a natural edge over closed source software. Most models are at least partially trained on our public resources, so we don’t have to worry about feeding them context and extensive documentation to “know” how Grafana works.

As a result, the models that we’ve used have shown promising performance almost immediately. There’s no need to explain what PromQL or LogQL are—the models already know about them and can even write queries with them.

This is yet another reason why we value open source: sharing knowledge openly benefits not just us, but the entire community that builds, documents, and discusses observability in public.

Keeping humans in the loop

With proper guidance, AI can take on tedious, time-consuming tasks. But AI sometimes struggles to connect all the dots, which is why engineers should ultimately be empowered to take the appropriate remediation actions. That’s why we’ve made “human-in-the-loop” (HITL) a core part of our design principles.

HITL is a concept by which AI systems are designed to be supervised and controlled by people—in other words, the AI assists you. A good example of this is Grafana Assistant. It uses a chat interface to connect you with the AI, and the tools under the hood integrate deeply with Grafana APIs. This combination lets you unlock the power of AI without losing any control.

As AI systems progress, our perspective here might shift. Basic capabilities might need little to no supervision, while more complex tasks will still benefit from human involvement. Over time, we expect to hand more work off to LLM agents, freeing people to focus on more important matters.

Talk about outcomes, not tasks or roles

When companies talk about building AI to support people, oftentimes the conversation revolves around supporting tasks or roles. We don’t think this is the best way to look at it.

Obviously, most tasks and roles were defined before there was easy access to AI, so it only makes sense that AI was never integral to them. The standard workaround these days is to layer AI on top of those roles and tasks. This can certainly help, but it’s also short-sighted. AI also allows us to redefine tasks and roles, so rather than trying to box users and ourselves into an older way of thinking, we want to build solutions by looking at outcomes first, then working backwards.

For example, a desired outcome could be quick access to any dashboard you can imagine. To achieve this, we first look at the steps a user takes to reach this outcome today. Next, we define the steps AI could take to support this effort.

The current way of doing it is a good place to start, but it’s certainly not a hard line we must adhere to. If it makes sense to build another workflow that gets us to this outcome faster and also feels more natural, we want to build that workflow and not be held back by steps that were defined in a time before AI.

AI is here to stay

AI is here to stay, be it in observability or in other areas of our lives. At Grafana Labs, it’s one of our core priorities—something we see as a long-term investment that will ensure observability becomes as easy and accessible as possible.

In the future, we believe AI will be a democratizing tool that allows engineers to utilize observability without becoming experts in it first. A first step for this is Grafana Assistant, our context-aware agent that can build dashboards, write queries, explain best practices and more.

We’re excited for you to try out our assistant to see how it can help improve your observability practices. (You can even use it to help new users get onboarded to Grafana faster!) To get started, either click on the Grafana Assistant symbol in the top-right corner of the Grafana Cloud UI, or find it in the menu on the main navigation on the left side of the page.

And if you want to learn more about how we’re applying these principles in practice, sign up for ObservabilityCON 2025 in London or an ObservabilityCON on the Road near you for all our upcoming AI-powered observability announcements.

FAQ: Grafana Cloud AI & Grafana Assistant

What is Grafana Assistant?

Grafana Assistant is an AI-powered agent in Grafana Cloud that helps you query, build, and troubleshoot faster using natural language. It simplifies common workflows like writing PromQL, LogQL, or TraceQL queries, and creating dashboards — all while keeping you in control. Learn more in our blog post.

How does Grafana Cloud use AI in observability?

Grafana Cloud’s AI features support engineers and operators throughout the observability lifecycle—from detection and triage to explanation and resolution. We focus on explainable, assistive AI that enhances your workflow.

What problems does Grafana Assistant solve?

Grafana Assistant helps reduce toil and improve productivity by enabling you to:

Write and debug queries faster
Build and optimize dashboards
Investigate issues and anomalies
Understand telemetry trends and patterns
Navigate Grafana more intuitively

What is Grafana Labs’ approach to building AI into observability?

We build around:

Human-in-the-loop interaction for trust and transparency
Outcome-first experiences that focus on real user value
Multi-signal support, including correlating data across metrics, logs, traces, and profiles

Does Grafana OSS have AI capabilities?

By default, Grafana OSS doesn’t include built-in AI features found in Grafana Cloud, but you can enable AI-powered workflows using the LLM app plugin. This open source plugin connects to providers like OpenAI or Azure OpenAI securely, allowing you to generate queries, explore dashboards, and interact with Grafana using natural language. It also provides a MCP (Model Context Protocol) server, which allows you to grant your favorite AI application access to your Grafana instance.

Why isn’t Assistant open source?

Grafana Assistant runs in Grafana Cloud to support enterprise needs and manage infrastructure at scale. We’re committed to OSS and continue to invest heavily in it—including open sourcing tools like the LLM plugin and MCP server, so the community can build their own AI-powered experiences into Grafana OSS.

Does Grafana Cloud’s AI capabilities take actions on its own?

Today, we focus on human-in-the-loop workflows that keep engineers in control while reducing toil. But as AI systems mature and prove more reliable, some tasks may require less oversight. We’re building a foundation that supports both: transparent, assistive AI now, with the flexibility to evolve into more autonomous capabilities where it makes sense.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

AI in observability at Grafana Labs: Making observability easy and accessible for everyone

How we’re building AI-based observability solutions for different audiences

Open source and AI is a super power

Keeping humans in the loop

Talk about outcomes, not tasks or roles

AI is here to stay

FAQ: Grafana Cloud AI & Grafana Assistant

Related content

How to monitor AI agent applications on Amazon Bedrock AgentCore with Grafana Cloud

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

Baking in site reliability with observability and AI: How SpotOn uses Grafana Assistant to keep...