Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

Going beyond AI chat response: How we're building an agentic system to drive Grafana

Going beyond AI chat response: How we're building an agentic system to drive Grafana

2025-06-30 7 min

As we look at the role AI can play in Grafana going forward, we want to move beyond the simple chat responses that dominate the world of LLMs today and into agentic systems—AI that can understand, reason, and act on your behalf.

The ultimate goal is to make it easy to get things done in Grafana using natural language—whether you’re a seasoned SRE or a new developer. And in the AI world, we call this moving from chat completion to task completion. Instead of just following up on information and reacting to it, you actually have an experience where a task is done for you.

In this post, we’ll recap the key steps on this path that were recently shared at GrafanaCON 2025, from teaching LLMs to “speak” Grafana to the powerful agentic workflows this will unlock.

From simple chat to ‘speaking’ Grafana

The first generations of LLMs have given developers a way to access the context of the internet—albeit a static snapshot of it using natural language in a single API call. And that’s useful, because we can feed some predefined context and instructions to the LLM and get a natural answer back. 

This has already enabled some new experiences in the Grafana ecosystem, such as Flame graph AI: you provide the LLM with a flamegraph, and it gives you a human-readable explanation of the performance bottleneck. It’s a single action that leads to a single outcome. This has the advantage that it excels in a very specific environment but not so much outside of that. 

It has another limitation, too: That LLM knows about the internet, but it doesn’t know about your world. It can’t see your dashboards, it can’t query your data sources, and it certainly can’t tell you which of your services has an active incident right now.

So, we had to teach LLMs to “speak” Grafana.

The key technology that unlocks this is based on the Model Context Protocol (MCP) and tool calling. This allows an LLM to not only generate text but also to call predefined functions or APIs. When a user asks a question, the LLM can now decide: “Which tool should I use to help me make more sense of this request? " Using these tools, the LLM can pull in real-time and contextually useful information.

To make this work for our projects and products, we created the open source Grafana MCP Server, which exposes core Grafana functionality—searching dashboards, querying Prometheus, managing incidents—as a set of tools that any MCP-compatible AI can use.

This opens the door to asking questions like, “Which active incidents do I have?” The LLM sees the question, recognizes it needs live data, calls the get_incidents tool from the Grafana MCP Server, receives the list of incidents from your Grafana instance, and then formulates an accurate, context-aware answer.

Putting an agent to work

To illustrate how agentic systems can help you during development, imagine that you’re working on a new Node.js service. You have it instrumented to expose some basic Prometheus metrics, and you have a Grafana instance ready to go. However, you haven’t had time to build your dashboards.

You could follow the typical process: Navigate the Grafana UI, add a data source, and manually create panels—all of which takes time and knowledge. Or, you could simply open an AI code editor like Cursor, pair it with our MCP server, and type a natural language prompt: “Can you create a dashboard based on the metrics in this code?”

Here, the agent doesn’t just guess. It initiates a multi-step plan, using the tools at its disposal from the Grafana MCP Server. It reasons that to build a useful dashboard, it must first:

  1. Call the list_datasources tool to see what’s available in your Grafana instance.
  2. Call the list_prometheus_metric_names tool to analyze the metrics your app is exposing.
  3. Call tools to inspect the labels for those metrics to understand their structure.

After gathering this context, it generates a complete, multi-panel dashboard in your Grafana instance with a single command. In seconds, a dashboard appears, created from scratch with no manual effort.

But let’s take this a step further. Let’s say you notice your new dashboard is missing latency metrics. So you ask the agent: “Can you add latency metrics to this?” But there’s just one problem: You don’t have a latency metric setup yet.

And here’s the really interesting part: The agent understands that the problem isn’t with the dashboard, but with the source code. It knows it can’t visualize a metric that doesn’t exist, so it explains that your app needs to be instrumented to measure request duration and then generates the exact code changes needed to add a Prometheus histogram metric.

With a simple confirmation, the agent edits your index.js file. Once the new metric is exposed, a final prompt of “Can you also add it to the dashboard?” is all it takes for the agent to update the existing dashboard with new panels for p95 and average request latency.

This entire workflow—from identifying missing instrumentation to modifying code to updating observability—was completed in a single, continuous conversation. This is what we mean by an agent that can drive Grafana. Combining both code and observability allows you to be faster and more productive.

Building agents that drive Grafana

Beyond the developer experience, the next logical step was to bring this agentic experience into Grafana. People use Grafana to monitor their service so they can keep it running reliably and troubleshoot issues quickly. But it takes a lot of time, effort, and knowledge to understand and operate complex systems. Think about how many people that have never used Grafana, have never set up an alert, or have never interacted with instrumentation. Or think about new folks that have just joined your team and don’t have your knowledge, your expertise of the system.

This is where the Grafana Assistant comes in, allowing you to just talk to Grafana and get things done. Announced at GrafanaCON 2025, Grafana Assistant deeply integrates with the Grafana ecosystem, bringing your relevant observability context with you in an easy-to-use sidebar.

We’re not just democratizing data, we’re democratizing the observability experience itself—allowing you to go places, find things, create a dashboard, or ask questions about your data, all using natural language. This is where Grafana Assistant shines, allowing everyday users to use Grafana more efficiently.

The challenges: Moving toward reliable, modular agents

The ultimate goal here is to make it easier to get the most out of Grafana, but building it isn’t easy. The “ask anything” experience is hard to create because there are a million questions people could ask about their Grafana instance.

Observability workflows are complex, and every user’s environment is unique. So in order to build a robust and reliable agent, we’re focusing on a few key areas:

  1. Evaluation: We’ve moved beyond “vibe testing” (i.e., “Does it feel right?”) to a framework of reproducible evaluations. We test our agents against a set of  scenarios in a controlled Grafana environment to ensure that every change improves quality and doesn’t cause regressions.
  2. Reducing token noise: APIs often return structured data like JSON. While great for machines, it’s “noisy” for LLMs. We’ve found that by pre-processing API responses into more natural, semantically rich sentences, we can reduce token usage by up to 4x, which lowers cost, decreases latency, and actually improves the LLM’s performance.
  3. Multi-agent architecture: We have started with single, monolithic agents that try to do everything. Now we’re looking into building a system of “expert agents.” A central coordinator agent receives the user’s request and delegates it to a specialized agent—like a dashboard agent, a query agent, or a support agent. This modular design is easier to debug and extend, and it makes the entire system more robust.

Putting this into practice with Grafana Assistant

This journey from simple chat to tool-using, multi-step agents is what led us to build Grafana Assistant, but we’re just getting started. The AI space is ever evolving and we want to be at the front, delivering AI workflows that matter. For that, we are experimenting with data structures, specialized agents, and evaluation frameworks.

If you’re interested in getting involved with what you’re building:

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Tags