Grafana Labs blog on Grafana Labs

Business metrics in Grafana Cloud: Get an AI assist to help securely analyze your data

Matt Wimpelberg — Wed, 08 Apr 2026 18:26:00

For today's modern businesses, the data landscape demands security and flexibility.

You need to connect your observability platform to rich, proprietary datasets that often reside in private networks without compromising security or managing complex network infrastructure. You may also face an extra layer of complexity in order to effectively query and visualize that data. Luckily, modern artificial intelligence tools have made these previously complicated processes much simpler.

This is where Grafana Cloud's private data source connect (PDC) truly shines, offering a secure, elegant solution to bring relational data like business metrics and analytics directly into your dashboards. This secure connection also allows Grafana Assistant to access the data and leverage the power of AI to visualize and query your data.

In this blog post, we’ll demonstrate how you can access that private data securely in Grafana Cloud and how to use our AI assistant to translate complex database queries into human readable language and visualizations.

Making business analytics easy with PDC, Assistant, and Postgres

Observability was born from the need to give engineers deeper visibility into their workloads, but the scope of how it's used is quickly expanding. In fact, half of all organizations today use observability tools to track business-related metrics such as security, compliance, revenue, order tracking, customer conversions, and more, according to our 2026 Observability Survey.

So while Grafana Cloud was built by engineers for engineers, it's also powerful and flexible enough to meet a wide range of needs, including business analytics. In this section, we'll briefly describe the tools you'll need to get started and tell you a little bit about the data source we'll use to demo this functionality.

The power of PDC

PDC is a key feature for enterprise-grade observability. It establishes a secure, encrypted, private connection between your Grafana Cloud instance and data sources hosted within your private networks.

Here's how it works: A lightweight PDC agent is deployed in your private network. This agent creates a customer-controlled SSH tunnel back to Grafana Cloud, securely routing all queries. This critical design choice means:

Security first: Your databases are never exposed to the public internet. Traffic is encrypted end-to-end.
Simplicity: You avoid the complexity of managing VPNs, NAT gateways, or intricate network-level access controls.
Scalability: The agent can be deployed for high availability and easily scaled to meet your query demands.
Local experience: You configure the data source in Grafana as if it were running locally within your private network.

PostgreSQL: analytics beyond metrics

While tools like Prometheus are essential for scraping and querying time series metrics from infrastructure and applications, many critical business insights live in relational databases. PostgreSQL, with its robust support for complex queries, joins, and rich datasets, is the perfect complement to pure metrics-based observability.

Consider the example of the World Happiness Report, which is a research-based global report that ranks countries by how happy their people say they are, and explores the social and economic factors behind those differences. This dataset is full of relational context: countries, years, GDP per capita, life expectancy, and social support. Visualizing this data requires sophisticated queries that are not easily performed using traditional metrics-optimized sources.

By connecting PostgreSQL via PDC, you can:

Query relational data like business metrics, customer survey results, or rich time series data
Perform complex joins to enrich time-series metrics with contextual data
Unlock deep analytics directly within your Grafana dashboards

Grafana Assistant: Query the data with natural language

Grafana Assistant is our LLM purpose-built for Grafana Cloud. It's an invaluable AI-powered feature that significantly accelerates the dashboard creation process, as it lets you leverage natural language prompts to generate complex queries and refine visualizations quickly.

In this demo, Grafana Assistant was used to rapidly construct and fine-tune the prebuilt dashboard, demonstrating how it can quickly turn raw PostgreSQL data into meaningful, happiness-focused visualizations.

AI-powered dashboard generation from PostgreSQL

When connecting to a rich data source like PostgreSQL via PDC, Assistant acts as an intelligent translator between your analytical goal and the necessary SQL.

Here's how Assistant works with the PostgreSQL data source:

Natural Language query translation: Instead of manually writing complex SQL joins and aggregations, a user can simply prompt the assistant. For example: "Show me the trend of 'Life Ladder' score over time for the top 5 happiest countries in 2024."
SQL generation: The AI processes this prompt, understands the structure of the connected PostgreSQL schema (e.g., table names, column names like Life Ladder, country_name, year), and automatically generates the precise SQL query required to fetch the data.
Visualization suggestion and refinement: Once the query runs, Assistant analyzes the returned dataset (e.g., time series data, categorical rankings). It then suggests the most appropriate visualization type (e.g., time series panel for trends, bar chart for rankings) and generates the panel configuration, including axis labels and legends.

This capability drastically lowers the barrier to entry for users who may not be SQL experts, allowing them to rapidly prototype and deploy complex analytical dashboards based on their private relational data.

Automated setup: A Terraform blueprint for secure observability

To demonstrate this modern observability pattern, we've created a comprehensive Terraform repository that automates the entire setup. This blueprint embodies the principle of "infrastructure as code" for your secure data connections.

Everything you need to set things up can be found in this public GitHub repo: https://github.com/mwimpelberg28/grafana_happiness

The blueprint includes the following components:

Amazon RDS PostgreSQL instance: Provisioned securely within a private Amazon VPC, preloaded with the World Happiness Report dataset
PDC agent deployment: The PDC agent is deployed within the same private VPC to establish the secure tunnel and enforce network restrictions
Grafana Terraform provider: Used to programmatically create the secure PostgreSQL data source, configured specifically to route queries over the PDC tunnel
Prebuilt Grafana dashboard: A ready-to-use dashboard featuring PostgreSQL queries to visualize happiness data, including:
- Time series panels tracking happiness scores over time
- Bar charts ranking countries by key metrics
- Tables correlating happiness metrics with factors like GDP per capita and life expectancy

Getting started with PDC and PostgreSQL

Before you start deploying the Terraform setup, you need to configure the connection credentials within Grafana Cloud. This involves setting up an access policy and a service account to authenticate the PDC agent.

Setup instructions

1. Find the cluster your Grafana stack is deployed in. You will use this as the value of the pdc_cluster Terraform variable.

2. Create a new service account with the Admin role within Grafana. In your Grafana Cloud instance, navigate to Administration > Users and access > Service accounts and then click Add service account.

Once the service account is created, click Add service account token. Copy the token and set it as the value of the sa_token Terraform variable.

3. Next, create a new access policy by navigating to Administration > Users and access > Cloud access policies, then click Create access policy. Name your policy and then add the following permissions:

You will need to click Add scope in order to add the stacks:read permission. After you’ve created the policy, click Add token, name it (and optionally set an expiration date), then copy the token to the cloud_access_policy_token Terraform variable.

Now that we’ve finished setting up connection credentials in Grafana, let’s set the remaining Terraform variables:

Variable

Example value

grafana_url

https://.grafana.net

grafana_slug

vpc_name

happiness-demo-vpc

5. Now you are ready to provision the infrastructure needed to support the demo. From the grafana_happiness repository directory, run terraform init to download the project’s dependencies, then terraform apply to create the infrastructure and Grafana resources. Please be aware that provisioning the necessary infrastructure and resources can frequently take eight minutes or more.

6. Once terraform apply has completed successfully you will find a new dashboard at Dashboards > Happiness > World Happiness Index:

Now you are visualizing data from your new private data source using PDC!

7. This demo deploys real cloud infrastructure that costs money, so remember to run terraform destroy when you are done exploring so you don’t incur unwanted expenses.

What's next

We encourage you to use the provided demo as a starting point for your business analytics journey with Grafana Cloud. And now that you are connected to your private network, you can use any of the supported data sources for Grafana Assistant to help you analyze and visualize your business data.

Assistant can run ad hoc queries, build dashboards, and help gain insight into data without having to learn numerous query languages. In addition, Assistant can also provide the translated SQL queries if you need to use them in other systems. And check out this guide for even more use cases to explore in your journey with Grafana Assistant.

Query fair usage in Grafana Cloud: What it is and how it affects your logs observability practice

Russ Erbe — Tue, 07 Apr 2026 18:32:14

In Grafana Cloud we use a simple yet generous formula that lets you query up to 100x your monthly ingested log volume in gigabytes for free. This works for the vast majority of our customers, but if you aren’t careful and strategic with your usage, you could find yourself with an overage bill.

We obviously don't want to surprise any customer with an unexpected bill, so in this blog post you'll learn how to find your usage ratio in your Grafana dashboard so you can understand what you're looking at, where the queries are coming from, and who your top users are. You'll also learn some best practices for applying good policies to your observability practice going forward.

What is 'query fair usage' and why does the policy exist?

In Grafana Cloud Logs, "query fair usage” refers to a pricing policy that lets you query up to 100x your monthly ingested log volume in GBs at no additional charge. It is primarily a billing mechanism designed to allow typical usage without extra charges, while preventing abuse from extremely resource-intensive queries.

Grafana Cloud’s query engine (especially for Loki logs) can scan huge amounts of data. Without guardrails, a few heavy queries could create disproportionate load. The fair‑use policy:

Keeps costs predictable
Encourages efficient querying
Protects shared infrastructure
Still allows generous exploration of logs

Example

If you ingest 50 GB of logs in a month:

Your fair‑use query allowance = 50 GB × 100 = 5,000 GB of queried logs.

As long as your queries scan ≤ 5,000 GB, you pay nothing extra for queries.

How to find and monitor your query fair usage

How to view query usage

For starters, you can actually view the cost of a query before you run a query. When you write queries in Explore, Grafana provides an estimate of how much data will be queried when you run your query.

But if you want to know your usage ratio, you have a few options. The first way, and the way most people would be aware of, is by going to the billing dashboard to view your current query ratio.

From the Grafana main menu, click the dashboard icon.
Select the Billing/Usage dashboard.
Scroll down to the Logs Ingestion and Query Details section.
Expand the section and scroll to the Query Usage Ratio panel.

The screenshot below is an example of what you might see. This is from a large environment with seven different Grafana instances.

Another way to view your usage ratio is by utilizing the newly redesigned Cost Management and billing experience. This is actually the way you should get familiar with viewing your usage, as the billing dashboard has been deprecated and will be removed some time in the future.

Once you're there, simply scroll down to Products, and locate and expand Logs. From there go to the bottom of the page where you will find the log query rate as well as the query usage ratio.

Determine the source of query usage

To help you track down the source of query usage, we built the Loki query fair usage dashboard. For Grafana Cloud customers using hosted Grafana, this dashboard is automatically provisioned on each of your hosted Grafana instances.

The dashboard shows a breakdown of your query usage, query type, dashboard, grafana-alert, and Explore/other, by query bytes and query count. For each type, there are rows showing more detailed information on highest volume queries, including:

The originating query
The Grafana username that submitted the query (if relevant)
Rule and dashboard names
Query size in bytes
Query execution frequency

And here's a breakdown of the different panels in the dashboard:

Grafana-alerts refers to rules managed within Grafana under Grafana Alerting found at Home > Alerts & IRM > Alerting. These rules can be alerts or recording rules. These are separate from the rules you upload to Loki with cortextool or lokitool using the Grafana Cloud APIs.
Explore/other refers to a subset of queries executed against Loki that come from the Explore page. It doesn’t include those coming from the Grafana Logs Drilldown app. It also includes queries that come from a non-Grafana frontend source such as logcli. Explore queries likely have a grafana_username populated in the dashboard, queries from other sources don’t.
Estimated interval is the estimation of how frequently the query ran over the selected time period. It’s the number of times the query ran divided by the total time range.

Understanding your Grafana Cloud invoice for logs

At this point, you have learned what query fair usage is, where to find it in your billing dashboard, and how to determine the source of your query usage. Now let's get an understanding of how that could impact your monthly invoice.

Grafana Cloud calculates logs usage by looking at the following components:

GBs ingested: The total number of GBs ingested into Grafana Cloud on a monthly basis.
GBs retained: How long the logs data are retained within Grafana Cloud.

The minimum retention period is 30 days and you can purchase additional retention periods in 30-day increments. Retention is customizable per stack or by individual streams within the same stack. To enable this, contact Grafana Support.

Note: The retention period changes are not retroactive. Once the retention is increased, the current logs will be stored following the new retention period, but logs already out of the old retention period will not be recovered.

Billing calculations

Billing is based on usage, and usage is determined by these primary factors:

The number of GBs ingested per month
The number of months of retention

Note: For customers exceeding the 100x fair use policy for GBs queried per month, the following billing calculations apply:

logs billable gb = max(ingested gb, queried gb / fair use query ratio)

This calculation is performed on a per-stack level.

Even though you can see what your query fair usage is at any time, you are only billed for it at the beginning of the month for your previous month. Oftentimes early in the billing cycle, your usage ratio will be high, but as the month goes on, it often drops as more logs are being sent in.

Example

Ingested: 50 GB
Queried: 7,000 GB
Fair‑use threshold: 5,000 GB

Billable GB = max(50, 7,000 / 100) = max(50, 70) = 70 GB

So you would be billed for 70 GB instead of 50 GB at whatever your set rate is per GB in your Grafana contract.

Managing your query fair usage costs

Now that we've walked through the usage policy and how it could impact your costs, let's finish with some tips for avoiding potential overages in the future

Recommendations

One common source of excess queries is misconfigured Grafana/Loki-managed alerting rules—for example, querying one hour of data but running that query every minute.

For alerting rules using the Loki data source:

Use instant queries instead of range queries for all rules. An instant query executes exactly one time and produces one data point for each series matched by your label selectors. Range queries are effectively instant queries executed multiple times. For more details, refer to the How to run faster Loki metric queries with more accurate results blog post.
Look at the evaluation period and interval period and make their intervals match the amount of time queried. That is, a rule that runs once per minute should have a query range of 1m.

Note: We also recommend checking alert rules run by the scheduler (recorded queries).

Best practices for fair query usage

To stay within fair usage policies and optimize performance:

Filter early: Use label selectors and log pipeline filters at the beginning of your queries (e.g., in LogQL) to reduce the data set before applying more complex operations.
Avoid wide scans: Be cautious with queries that scan large time ranges or entire datasets, especially when using Grafana Assistant.
Narrow time ranges: Select a smaller time frame instead of querying “last 30 days” by default.
Use aggregation and recording rules: Define Prometheus recording rules to pre-calculate frequently used, resource-heavy expressions into new metrics. Querying these pre-aggregated metrics is much more efficient than calculating them ad-hoc.
Avoid wildcard-heavy filters: They force Loki to scan massive amounts of data, which will consume your query usage quickly. A precise label selector like {cluster="us-central1"} can reduce the search space from, say, 100 TB down to 1 TB.
Use labels wisely: Loki’s indexing model rewards good label design. For more details on how to do this, check out our concise guide to labels in Loki.
Optimize alert rules: Ensure your log-based alert rules follow best practices, as they are a common cause of excessive query usage. And for more information on how to address poorly defined alert rules, check out our docs page on this topic.
Monitor usage dashboards: Regularly check the "Billing and Usage" dashboards in your Grafana Cloud portal to understand your consumption patterns and configure alerts for unexpected spikes. You can also use Grafana's "Usage insights" feature (available in Grafana Enterprise and Grafana Cloud) to identify heavy-hitting queries or unused dashboards.

Configure Loki query limit policies

If all else fails and with all the best practices in place, you still find your company going over the query fair usage policy, Grafana has recently introduced a way to basically put up guardrails to prevent expensive queries.

Note: Loki query limit policies are currently in public preview. Grafana Labs offers limited support, and breaking changes might occur prior to the feature being made generally available.

This feature is disabled by default. Contact Grafana Support to enable query limit policies using the lokiQueryLimitsContext feature flag.

Loki query limit policies provide fine-grained control over how users query your Grafana Cloud Logs data. You can configure these policies as attributes on access policies to limit query result sizes.

When a query exceeds a configured limit, users receive meaningful error messages that explain why the query was rejected and how to adjust it.

How query limit policies work

Query limit policies are applied as lokiQueryPolicy attributes on access policies. When a user makes a request using a token associated with an access policy that has query limits configured, Loki validates the entire time period of the query against those limits before execution.

Query limit policies are not enforced for Loki managed or Grafana managed alerts.

To learn more about Loki query limit policies and how to configure them, please see this link to the Grafana Documentation.

Observability in Go: Where to start and what matters most

Grafana Labs Team — Mon, 06 Apr 2026 15:51:58

Sometimes the hardest part of debugging a system isn’t fixing the problem—it’s figuring out what’s actually happening in the first place.

In this episode of “Grafana’s Big Tent” podcast, host Mat Ryer, Principal Software Engineer at Grafana Labs, is joined by Donia Chaiehloudj, Senior Software Engineer at Isovalent (Cisco) and co-author of “Learn Go with Pocket-Sized Projects,” along with Charles Korn, Principal Software Engineer at Grafana Labs and Bryan Boreham, Distinguished Engineer at Grafana Labs, to talk about observability in Go.

They dig into where to start (hint: logs are often the first step) and how context, metrics, traces, and profiling fit together as systems grow more complex. Along the way, they share practical lessons on turning logs into metrics, avoiding common pitfalls with context and tracing, using pprof effectively, and what eBPF unlocks when you need visibility beyond your application.

You can watch the full episode in the YouTube video below, or listen on Spotify or Apple Podcasts.

(Note: The following are highlights from episode 8, season 3 of “Grafana’s Big Tent” podcast. This transcript has been edited for length and clarity.)

Starting a Go project: Where observability begins

Donia Chaiehloudj: I would go simple to start. We know that we are always refactoring along the way and that priorities change, like real life. But I would try to go for the Go standard library as much as possible, because we know that it’s stable and not going to be archived tomorrow.

I would also go for well-known libraries that are not standard, but are used by a lot of people and are well-maintained, even though we know that contributors are less and less in the open source world. I would also think about some standardization from the beginning—for your data, your context, the way you want to trace, that kind of thing.

Mat Ryer: Yeah, I think that makes sense. I like your point that you're going to refactor. Things are going to change. That kind of takes a bit of pressure off. It doesn't have to be perfect straight away.

Starting with logs—and turning them into metrics

Charles Korn: The thing I use most often, at least in the stuff that I'm working on at the moment, is logs. From logs, you can derive metrics if you really need to, so that's probably where I'd start. They're really easy to get started with. You can dump them into a file, you can dump them to the console, and you can start shipping them off to a system like Loki.

Mat: Yeah, I think starting with logs is quite natural.

Charles: We've got a bunch of Go services at Grafana Labs, and unfortunately, occasionally they panic, and they dump the trace to their logs and they get stuck to standard error, and they get picked up by our logging system.

And it's really useful to be able to show that on a graph—how often a thing's panicking. We actually have a system where it'll look at the logs, count the number of things that look like a panic, and turn that into a metric. And then we can alert on that metric just like any other metric. That's really helpful.

Mat: Yeah, so you literally then get a graph that shows you how many panics you're having.

Charles: Exactly. And you can have alerts on that.

Tracing and why context matters

Bryan Boreham: Tracing adds that explicit parent-child relationship, and everything's always got a beginning and an end. So I think tracing is kind of the superpower to figure out, with any complicated program, what happened.

Mat: So the idea being it's spending more resources in that bit, and therefore, if you're going to optimize something, go for the big-hanging fruit, would you say?

Bryan: I suppose so.

Mat: So you mentioned that you would do that only in advanced projects or complicated projects. How do you know when it's time to reach for tracing?

Bryan: Well, for myself, I said 20 or 30 lines, but it gets complicated. So my bar is very low. My ability to concentrate on things is quite poor.

It also depends, because tracing is quite complicated, or people find it complicated to set up. With logs, you just stick it in a file and then read it, so it's orders of magnitude different.

But tracing really comes into its own when you have multiple bits in what you call a distributed system, multiple frontend and backends, or multiple bits of backend, or something like that. You pass the same idea around to everything because it's related, and then they're all logging or they're all reporting using that same idea, and that allows you to then tie the whole process together across all these multiple systems.

Mat: Yeah, it's very cool, and of course makes sense at scale.

Errors, tradeoffs, and observability in Go

Charles: One thing I do miss sometimes coming from other languages is that you've got an exception type, and each of those exceptions is a particular type. It's a file-not-found error or a network error or whatever it is.

Whereas with Go, most of those things are just strings. So if you're going to do any kind of analysis, like how many file-not-found errors did I get, that could be quite tricky in Go because they're just a whole bunch of strings.

But at the same time, it makes it really simple to create these really rich errors. They're really easy, as an engineer trying to solve a problem, to get that context of what's going on. It's a bit of good and bad.

Donia: So I started my career with Go. And for me, it was very natural to have error types. That was like, "no, I want to create new types of errors" if I had something specific. And it was a reflex to just check if there was the type of error that I wanted already in the library.

I did one year of Java in a company and I was playing with exceptions, and I was confused, actually. I want to define my own error type, because it's something very specific. And I want to type it for that type of library that I'm dealing with.

So what Charles is talking about is really interesting—the way exceptions can be, in general, easier maybe. But I find that error types and being more granular is easier to read in the code and to understand when you're debugging, too.

Profiling with pprof

Mat: So then profiles. I know Go has pprof. What is pprof?

Charles: It's a tool that allows you to measure the performance of your Go application. And it can show you a bunch of different profiles. The ones that I use most often are CPU time. It's literally just how much time is spent in different functions. And the other one I spend a lot of time looking at is memory consumption, like peak in-use memory consumption.

Donia: I was very intimidated at the beginning of my career by pprof, actually. Do you have any advice for someone getting started with it?

Bryan: I was just thinking to myself, actually, that there were one or two gotchas. The big one that catches some people is that they dive into CPU profiling when they don't actually have a CPU problem.

They're not running out of CPU. They've got a program that's slow, and so they think, "Oh, profiling." Then it turns out that this program is slow because it's waiting on some other program, like a database, and a profile will not show you that.

The simplest way to watch out for it is to watch your CPU meter. If it's ticking along at sort of 0.1 CPU usage or something like that, then it's very unlikely that profiling is going to get you anywhere. Whereas if the fans are all running, 18 CPUs going in parallel, then that's probably a good one to point the CPU profiler at.

The next thing is that it's almost always memory allocation in Go that is causing issues. If you do have a CPU problem, look at the memory profile, is my next top tip.

eBPF and observing the “dark side” of systems

Donia: eBPF, for people who maybe don't know what it is, is a way to write C programs, BPF programs, in the Linux kernel to dynamically observe or secure your kernel.

That's very powerful. But it can be very daunting and out of cost to write BPF programs. So having Go wrappers on top of that is very interesting.

Something I personally like about eBPF is that you can actually access dark sides of your kernel that you can't access from user space.

What Go could improve

Charles: One thing that would be really helpful is if, when you put out a stack trace, it could say, “give me the pointer address of this pointer, and print out this value from that struct.” The other thing that's kind of related is errors. I'd love to be able to get a stack trace reliably for an error.

Bryan: I would love more flexibility, and it's probably more in the debugging tool than in Go itself.

“Grafana’s Big Tent” podcast wants to hear from you. If you have a great story to share, want to join the conversation, or have any feedback, please contact the Big Tent team at bigtent@grafana.com.

Finding performance bottlenecks with Pyroscope and Alloy: An example using TON blockchain

Anatoly Korniltsev — Mon, 30 Mar 2026 16:22:31

Performance optimization often feels like searching for a needle in a haystack. You know your code is slow, but where exactly is the bottleneck?

This is where continuous profiling comes in.

In this blog post, we’ll explore how continuous profiling with Alloy and Pyroscope can transform the way you approach performance optimization. Using real-world examples from last year’s TON blockchain optimization contest, a C++ developer challenge, we’ll explore how modern profiling tools accelerate the optimization process.

First, some background on the contest

The Open Network (TON) blockchain optimization contest is a C++ optimization challenge where contestants have to squeeze every microsecond out of a blockchain validation algorithm.

The challenge was straightforward: participants were given the reference implementation based on the original block validation algorithm in TON. Their task was to optimize the implementation, which had to be consistent with the reference algorithm. Scores were based on execution time.

While we did not directly participate in the contest, a handful of Pyroscope engineers ran several contestant submissions locally and profiled them. This allowed us to observe where the optimized implementations spent their time and how specific changes affected performance.

We used Alloy, an open source OpenTelemetry collector with built-in Prometheus pipelines and support for metrics, logs, traces, and profiles. Specifically, we leveraged Alloy’s pyroscope.ebpf component, an eBPF-based CPU profiler, to capture detailed profiling data and send it to Grafana Cloud for analysis. This approach allowed us to identify hotspots and track optimization progress.

With Alloy’s eBPF-based profiling, we were able to gain immediate visibility into performance bottlenecks without modifying a single line of contestant code.

Alloy setup

Setting up eBPF-based profiling with Alloy requires minimal configuration:

pyroscope.write "staging" {
endpoint {
url = ""
basic_auth {
username = ""
password = ""
}
}
}
pyroscope.ebpf "default" {
targets_only = false
forward_to = [pyroscope.write.staging.receiver]
demangle = "full"
}

Replace with your Pyroscope server URL, and and with your Grafana Cloud credentials if sending data to the cloud. For local setups, you can skip the authentication and point to a local Pyroscope instance.

The profiler runs with root privileges and starts immediately:

sudo ./alloy run ./ebpf.alloy.txt

Once running, it profiles the entire system and sends data to your configured endpoint.

For the contest, we compiled with clang using RelWithDebInfo to preserve symbols for proper flame graph visualization:

CC=clang CXX=clang++ cmake ../ton -DCMAKE_BUILD_TYPE=RelWithDebInfo
make contest-grader -j
./contest/grader/contest-grader --threads 8 --tests ../../tests

Crypto library optimizations

Looking at the reference implementation flame graph, we can see that vm::DataCell::create (DataCell deserialization) consumes about 14% of the total execution time. This function is responsible for creating and validating cells, which are TON's fundamental data structure. Each cell can store up to 1023 bits of data and references to other cells, forming a directed acyclic graph.

The SHA256 computation happens because every cell in TON has a cryptographic hash that serves as its unique identifier. During deserialization, the system must compute SHA256 hashes to verify data integrity, prevent circular references, and enable efficient deduplication. This hash computation involves serializing the cell's data, descriptor bytes, reference depths, and reference hashes into a single byte string that gets hashed with SHA256.

Another crypto operation hotspot is vm::exec_ed25519_check_signature, which implements the TVM bytecode operation for Ed25519 signature verification. This operation is frequently called during smart contract execution and transaction validation.

These cryptographic operations represent natural optimization targets, as they consume significant CPU time during blockchain validation.

SHA256 alternative implementation

Sometimes the most effective optimizations are the simplest ones. One contestant took the low-hanging fruit approach and replaced the default OpenSSL SHA256 implementation with an alternative from SerenityOS. This submission (entry6294) swapped out the library routine with one from SerenityOS's crypto library.

The flame graph diff shows the impact: a ~2% total speedup. While this might seem modest, every percentage point matters in competitive optimization. It's unclear why the SerenityOS implementation was faster, but the execution time and flame graph diff data confirmed the improvement.

SHA256 single feed

Beyond replacing the SHA256 implementation, contestants also optimized how the algorithm is used. One particularly effective optimization consolidated multiple SHA256 feed operations into a single call within CellChecker::compute_hash. This pull request demonstrates how algorithmic improvements can be more impactful than library replacements.

The change sped up DataCell::create by 20% and improved overall verification performance by 3.5%. By reducing the overhead of multiple hash update calls and leveraging more efficient batched processing, this optimization showed that understanding the usage patterns of cryptographic functions can lead to gains.

ED25519

Another straightforward optimization targeted the Ed25519 signature verification in vm::exec_ed25519_check_signature. Like the SHA256 case, this involved replacing the default OpenSSL implementation with an alternative that uses handwritten assembly for x86_64.

While this approach sacrifices portability for performance, the results justified the trade-off in a contest environment. The assembly-optimized implementation delivered a ~1.5% speedup, demonstrating how platform-specific optimizations can provide measurable gains even for well-established cryptographic operations.

Ordered collections replacements

Another low-hanging fruit optimization involved replacing std::map with std::unordered_set in CellStorageStat::add_used_storage(). The original implementation used a map to track visited cells:

- std::map seen;
+ std::unordered_set seen;

This seemingly trivial change provided a ~10% speedup. The performance improvement came from the difference between these data structures: std::map maintains elements in sorted order using a balanced binary tree (typically red-black tree), providing O(log n) lookup time. In contrast, std::unordered_set used a hash table with O(1) average lookup time.

Since the collection is only used for memoization to avoid reprocessing the same cells, ordering is unnecessary. The hash-based lookup eliminated the overhead of tree traversal and comparison operations, making cell deduplication significantly faster.

Custom profilers

Interestingly, contestant submissions and the TON codebase itself included custom-built profiling solutions. This demonstrates the lack of ready-to-use, gold-standard profilers in the C++ ecosystem, forcing developers to implement their own instrumentation when they need deeper insights.

Tracing profiler

One contestant implemented a manual instrumentation tracing profiler with RAII-style timing blocks. The system used a PROFILER(name) macro that created static IDs for O(1) record lookup and automatically measured execution time using RAII destructors. While lightweight and precise, it required manual code instrumentation at every point of interest.

The profiler aggregated timing data by call site and provided sorted output showing the most expensive operations first. This approach offered fine-grained control over what gets measured but came with the overhead of manual instrumentation and potential code clutter.

Memory profiler

The TON monorepo includes a sophisticated memory allocation profiler (memprof) that intercepts all malloc/free calls and C++ new/delete operators. It captures full stack traces for each allocation, aggregates them by call site, and maintains a hash table of unique allocation patterns.

The profiler uses fast assembly-based stack walking on x86_64 with fallback to standard backtrace functions. It can track memory usage patterns, identify leaks, and provide detailed allocation statistics, which are essential for optimizing memory-intensive blockchain validation.

These custom profiling implementations highlighted a common challenge in C++ optimization work: the absence of standardized, production-ready profiling tools forces developers to reinvent the wheel. eBPF-based profiling with tools like Alloy offers an attractive alternative, providing comprehensive system-wide profiling without requiring custom instrumentation or code modifications.

Wrapping up

You can learn more about each implementation in the contest here; winners are also listed anonymously on that page.

Looking back on the contest, the flame graph visualizations in Pyroscope made it easy to spot hotspots like DataCell::create consuming 14% of execution time, while flame graph diffs clearly showed the impact of each optimization attempt.

What's particularly striking is how contestants achieved significant speedups through relatively simple changes: swapping crypto libraries, replacing ordered collections with hash tables, and optimizing algorithmic patterns. These optimizations, ranging from 1.5% to 20% improvements per change, demonstrate that performance gains often come from understanding your data structures and choosing the right tool for the job.

The big take-away for me was that modern profiling tools like Pyroscope and Alloy are making performance optimization more accessible and data-driven. Whether you're optimizing blockchain validators or any other performance-critical application, continuous profiling should be in your optimization toolkit from day one.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Grafana security release: Critical and high severity security fixes for CVE-2026-27876 and CVE-2026-27880

Mariell Hoversholm — Thu, 26 Mar 2026 04:43:23

Today we are releasing Grafana 12.4.2 along with patches for Grafana 12.3, 12.2, 12.1, and 11.6, which include critical and high severity security fixes. We recommend that you install the newly released versions as soon as possible.

Grafana 12.4.2 with security fixes:

Download Grafana 12.4.2

Grafana 12.3.6 with security fixes:

Download Grafana 12.3.6

Grafana 12.2.8 with security fixes:

Download Grafana 12.2.8

Grafana 12.1.10 with security fixes:

Download Grafana 12.1.10

Grafana 11.6.14 with security fixes:

Download Grafana 11.6.14

As per our security policy, Grafana Labs customers have received security patched versions two weeks in advance under embargo, and Grafana Cloud has been patched.

We have also coordinated closely with all cloud providers licensed to offer Grafana Cloud. They received early notification under embargo and confirmed that their offerings are secure at the time of this announcement. This is applicable to Amazon Managed Grafana and Azure Managed Grafana.

CVE-2026-27876: SQL expressions arbitrary file write enabling remote code execution

Grafana's SQL expressions feature enables transforming query data with familiar SQL syntax. This syntax, however, also permitted writing arbitrary files to the file system in such a way that one could chain several attack vectors to achieve remote code execution.

The CVSS score for this vulnerability is 9.1 CRITICAL (CVSS link).

The following prerequisites are required for this vulnerability:

Access to execute data source queries (Viewer permissions or higher)
The sqlExpressions feature toggle must be enabled on the Grafana instance.

Impact

An attacker with access to execute data source queries could overwrite a Sqlyze driver or write an AWS data source configuration file in order to achieve full remote code execution. We have confirmed this vulnerability could be exploited to acquire an SSH connection to the Grafana host.

Impacted versions

Grafana versions v11.6.0 and later are impacted by this vulnerability.

Solutions and mitigations

We recommend upgrading to one of the patched versions listed above as soon as possible.

If an upgrade is not immediately possible, the following workarounds reduce risk. Note: these may cause disruption to Grafana users and do not fully remediate the vulnerability.

Option 1: Disable the sqlExpressions feature toggle.

Option 2: Perform ALL of the following:

If you have Sqlyze installed: update to at least v1.5.0 or disable it.
Disable all AWS data sources you have installed.

CVE-2026-27880: Unauthenticated denial-of-service via OpenFeature endpoint

Grafana's OpenFeature feature flag validation endpoints do not require authentication and accept unbounded user input. This input is read into memory.

The CVSS score for this vulnerability is 7.5 HIGH (CVSS link).

Impact

An attacker could crash the Grafana server by sending requests that exhaust available memory.

Impacted versions

Grafana versions v12.1.0 and later are impacted by this vulnerability.

Solutions and mitigations

We recommend upgrading to one of the patched versions listed above as soon as possible.

If an upgrade is not immediately possible, any of the following workarounds reduces risk:

Deploy Grafana in a highly available environment with automatic restarts.
Implement a reverse proxy in front of Grafana that limits input payload size. Cloudflare does this by default. Nginx supports this via explicit configuration.

Timeline and post-incident review

Here is a detailed incident timeline. All times are in UTC.

CVE-2026-27876

Date/Time (UTC)

Event

2025-02-06

sqlExpressions feature reimplemented with MySQL syntax and released in v11.6.0

2026-02-23 13:33

Internal incident declared

2026-02-23 15:08

Grafana Cloud patched

2026-03-09

Private release issued to customers under embargo

2026-03-25

Public release

2026-03-26 04:00

Blog published

CVE-2026-27880

Date/Time (UTC)

Event

2025-06-27

New OpenFeature evaluation endpoint introduced and released in v12.1.0

2026-02-24 13:12

Internal incident declared

2026-02-24 17:49

Grafana Cloud stacks not behind Cloudflare were patched; Cloudflare-backed stacks were not affected

2026-03-09

Private release issued to customers under embargo

2026-03-25

Public release

2026-03-26 04:00

Blog published

Acknowledgements

We would like to thank Liad Eliyahu, Head of Research at Miggo Security, for responsibly disclosing CVE-2026-27876 through our bug bounty program.

CVE-2026-27880 was discovered internally by the Grafana Labs security team.

Reporting security issues

If you think you have found a security vulnerability, please go to our Report a security issue page to learn how to send a security report.

Grafana Labs will send you a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.

Important: We ask you to not disclose the vulnerability before it has been fixed and announced, unless you received a response from the Grafana Labs security team that you can do so.

You can also read more about our bug bounty program and have a look at our Security Hall of Fame.

Security announcements

We maintain a security advisories page, where we always post a summary, remediation, and mitigation details for any patch containing security fixes. You can also subscribe to our RSS feed.

From raw data to flame graphs: A deep dive into how the OpenTelemetry eBPF profiler symbolizes Go

Marc Sanmiquel — Wed, 25 Mar 2026 14:52:53

Imagine you're troubleshooting a production issue: your application is slow, the CPU is spiking, and users are complaining. You turn to your profiler for answers—after all, this is exactly what it's built for.

The profiler runs, collecting thousands of stack samples. eBPF profilers, including the OpenTelemetry eBPF profiler, operate at the kernel level, so they capture raw program counters: memory addresses pointing into your binary. Before these addresses reach Pyroscope, the open source continuous profiling database, they have to pass through a process called symbolization.

Here's what that data looks like before symbolization:

Raw memory addresses. Long strings of hexadecimal with no obvious meaning.

Which function is actually consuming CPU? Where in your code should you even start looking? To make sense of this, you'd need to manually map each address back to your binary, assuming you have the exact version that’s running in production. In many cases, that’s slow, error-prone, or simply impossible.

Now, you look at the same profile with symbolization enabled:

Suddenly, everything clicks. You can see exactly what's consuming CPU: main.computeResult is your bottleneck. You know which function to investigate, and can jump straight to the source code to start optimizing.

This transformation from useless hex addresses to actionable function names is symbolization. And for eBPF profilers, making this happen is far more complex than it might seem.

In this post, we’ll unpack that process step by step by following a single memory address through the entire symbolization pipeline, from a raw program counter all the way to a function name. We’ll focus specifically on Go programs, which have a unique advantage: they embed a .gopclntab section that remains in the binary even when debug symbols are removed (stripped), enabling profilers to extract function names on-target. In contrast, most other native languages rely on server-side symbolization, which is why Go programs tend to produce better profiling data out of the box.

What you'll learn

Whether you're debugging missing symbols in production or wondering why your stripped Go binaries still profile correctly while C programs show hex addresses, this post will demystify Go symbolization in eBPF profilers from the ground up.

We'll explore:

What symbols are and where they hide in your binaries (you might be surprised to learn they can represent a significant part of your binary's size)
The pipeline steps from raw address to function name, with real code from the OpenTelemetry eBPF profiler
Binary search and frame caching—the performance tricks that make symbolization fast enough for production
Practical commands (readelf,nm, file) to inspect your own binaries
What happens when symbolization fails and how to debug it

By the end, you'll understand why Go programs profile better than other native languages even when stripped, how to debug symbol issues, and why gopclntab—a compact data structure that maps every function's address range to its name and source location—makes Go uniquely suited for eBPF profiling.

Why symbolization is a challenge with eBPF profilers

Traditional profilers inject agents into your process, call runtime APIs, or even recompile your code with instrumentation. Need a function name? Just ask the running program.

eBPF profilers can't do any of that. They run in the kernel space, which, on one hand, gives them superpowers—they can profile any process, see through container boundaries, and capture kernel stacks without modification. But this comes with strict constraints:

What eBPF profilers can see:

Which instruction is currently executing (a memory address)
The stack of return addresses (more memory addresses)
Process memory maps (which binary contains each address)

What eBPF profilers cannot do:

Modify the running program
Call functions inside your application
Access language runtime APIs (Go's reflection, Python's introspection)
Load debugging agents or libraries into processes

When the profiler captures a stack trace, it gets this:

[0x00000000000f0318, 0x00000000000f0478, 0x0000000000050c08]

Three addresses. No names, no context, no metadata. Everything must be figured out externally by analyzing binary files on disk, while maintaining sub-1% CPU overhead in production.

This constraint shapes the entire symbolization architecture:

All symbol extraction happens outside the process: parsing ELF files, DWARF debug info, and language-specific sections like Go's gopclntab
Performance is critical: with 20-100 samples/sec across hundreds of processes, the profiler needs microsecond lookups
Graceful degradation: production binaries are often stripped; the profiler needs fallback strategies

Introducing our Go program example

To make these concepts concrete, we’ll use a simple Go program throughout this post. Here's the complete code:

package main
import (
"os"
"runtime/pprof"
"time"
)
func processRequest(n int) int {
data := fetchData(n)
return computeResult(data)
}
func fetchData(n int) int {
sum := 0
for i := 0; i < n; i++ {
sum += i * i
}
return sum
}
func computeResult(data int) int {
result := 0
for i := 0; i < data/1000; i++ {
result += i * 2
}
return result
}
func main() {
f, _ := os.Create("cpu.pprof")
defer f.Close()
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
start := time.Now()
for time.Since(start) < 10*time.Second {
processRequest(50000)
}
}

Clear call relationships: main → processRequest → fetchData and computeResult. When profiled, computeResult dominates CPU time due to its larger loop.

Compile it:

# Disable optimizations to prevent inlining
go build -gcflags="all=-N -l" -o demo demo.go

This produces a ~2.6MB binary we’ll explore throughout this post.

What is symbolization: a closer look

Symbolization is the process of mapping memory addresses to function names. When our demo compiles, the compiler transforms source into machine instructions:

func processRequest(n int) int {
data := fetchData(n)
return computeResult(data)
}
// Becomes machine code at address 0xf0310
// objdump -d demo | grep -A8 "00000000000f0310"
00000000000f0310 :
f0310: ldr x16, [x28, #16]
f0314: cmp sp, x16
f0318: b.ls f0350
f031c: str x30, [sp, #-48]!
f0320: stur x29, [sp, #-8]
...

The compiler knows main.processRequest starts at address 0xf0310. Symbolization is the process of recovering that mapping when all you have is the address.

When the eBPF profiler samples your running application, it captures a stack trace of addresses:

0x00000000000f0318 ← CPU is here (inside processRequest)

0x00000000000f0478 ← Called from here (inside main.main)

0x0000000000050c08 ← Called from here (runtime.main)

To transform these addresses into the flame graph you see in Pyroscope, the profiler must answer: "What function contains address 0xf0318?"

The answer: symbol tables

The compiler embeds this mapping in the binary’s symbol table. Here’s what nm shows for our demo:

nm demo | grep -E 'main\.(process|fetch|compute)|runtime.main
00000000000f03e0 T main.computeResult
00000000000f0370 T main.fetchData
00000000000f0310 T main.processRequest
00000000000f0470 T main.main
0000000000050c00 T runtime.main

Each line maps an address to a name. Given address 0xf0318, the profiler searches this table, finds it falls between 0xf0310 (processRequest) and 0xf0370 (fetchData), and returns main.processRequest.

Note: Not all symbols appear in flame graphs—only functions where the profiler captured samples. If fetchData runs too fast to be sampled, it won't appear, even though nm shows it exists. Profilers show where time is spent, not what was called.

The lookup challenge

If symbolization were as simple as saying "read table and look up address," it would be trivial. But production profiling faces several challenges:

Performance: Thousands of lookups per second across hundreds of processes
Missing symbols: Production binaries are often stripped to save space
Multiple formats: Go binaries may have gopclntab, ELF symbol tables, or DWARF debug info.
Size constraints: Symbol information can represent 20-30% of binary size
Dynamic loading: Shared libraries load at different addresses each run

What's inside a binary?

Our compiled demo is 2.6 MB. Where does that space go? Let’s explore the sections:

readelf -S demo | grep -E 'Name|gopclntab|symtab|debug'

This shows section headers, but sizes appear on the next line. To see everything clearly:

readelf -S demo | grep -A1 "\.text\|\.gopclntab\|\.debug_info\|\.debug_line"

You'll see output like:

[ 1] .text PROGBITS 0000000000011000 00001000
00000000000dfc04 0000000000000000 AX 0 0 16
[ 6] .gopclntab PROGBITS 00000000001426c0 001326c0
000000000008f848 0000000000000000 A 0 0 32

The second line shows the size in hex. Converting these to human-readable format (you can use printf '%d\n' 0x8f848 or a calculator) will show:

Section

Hex size

Human size

Purpose

.text

0xdfc04

0.87 MB

Actual executable code

.gopclntab

0x8f848

0.56 MB

Go's PC-to-line table (22% of binary!)

.debug_info

0x3ddca

0.24 MB

DWARF debug information

.debug_line

0x1c00e

0.11 MB

DWARF line number mappings

Key insight: Symbol information (.gopclntab + debug sections) represents ~35% of this binary's size.

Finding functions with nm

We can use nm to list the symbols in our binary and confirm the address-to-function mapping:

nm demo | grep -E 'processRequest|fetchData|computeResult'
00000000000f0310 T main.processRequest
00000000000f0370 T main.fetchData
00000000000f03e0 T main.computeResult

Format: address type name. The T means "function in the text section." When the profiler sees address 0xf0318, it searches this table and finds it falls within main.processRequest (which starts at 0xf0310).

The stripped binary trade-off

Production binaries are often stripped to save space:

cp demo demo-stripped
strip demo-stripped
ls -lh demo demo-stripped

Output:

-rwxr-xr-x  2.6M  demo
-rwxr-xr-x  1.9M  demo-stripped    # 27% smaller!

Quick way to check if a binary is stripped:

file demo
# demo: ELF 64-bit LSB executable, ARM aarch64 ... not stripped
file demo-stripped
# demo-stripped: ELF 64-bit LSB executable, ARM aarch64 ... stripped

Check what happened to symbols:

nm demo | wc -l           # 4,041 symbols
nm demo-stripped          # "no symbols"

But Go has a safety net—.gopclntab survives stripping:

readelf -S demo-stripped | grep gopclntab
[ 6] .gopclntab        PROGBITS         00000000001426c0  001326c0

This is why Go is special. When you strip a C or Rust binary, symbolization becomes impossible without separate debug files. When you strip a Go binary, gopclntab remains embedded—it's required by Go's runtime for panic traces and reflection. The OpenTelemetry eBPF profiler can still extract every function name.

This asymmetry is why Go programs are particularly well-suited for eBPF profiling in production. You can strip binaries to save space without sacrificing observability, as the profiler continues to provide full function names.

The symbolization pipeline

When the eBPF profiler captures address 0xf0310 from our demo program, here's the journey to transform it into main.processRequest:

Raw Address: 0x00000000000f0310

↓

[1] Find the binary

↓

[2] Load symbol information

↓

[3] Extract symbols from gopclntab

↓

[4] Cache the result

↓

Result: main.processRequest

Step 1: Find the binary

The profiler reads /proc//maps to see all memory mappings for the process. Each line shows a memory region with its address range, permissions, and which file it maps to.

For our demo, one of those lines would show:

r-xp demo

The profiler checks: does 0xf0310 fall within this range? Yes → it's in our demo binary. The profiler now knows which file to analyze.

Step 2: Load symbol information

The profiler opens the ELF file (libpf/pfelf/file.go:171-183 - Open()) and looks for the .gopclntab section, which is Go's primary symbol source. If gopclntab is missing or corrupted (extremely rare), it falls back to standard ELF symbol tables.

Step 3: Extract symbols from gopclntab

This is where Go’s design shines. The profiler doesn't need to try multiple strategies or handle complex fallbacks—gopclntab provides everything needed.

What is gopclntab, exactly?

The .gopclntab section (Go "program counter to line table") is a compact data structure that maps every function's address range to its name and source location. The Go compiler embeds this because the runtime needs it for:

Stack traces in panic messages
Runtime reflection (runtime.FuncForPC)
Profiler support (runtime/pprof)

Because it's required by the runtime, gopclntab is always present, even in stripped binaries.

The structure

Let's see what gopclntab contains for our demo:

# Extract gopclntab section to analyze it
readelf -S demo | grep -A1 gopclntab

Output:

[ 6] .gopclntab        PROGBITS         00000000001426c0  001326c0
000000000008f848  0000000000000000   A       0     0     32

The section is 0x8f848 bytes (0.56 MB), or about 22% of our binary. It contains a header followed by a table of function entries. Each entry stores:

Function start address (PC)
Function end address
Function name offset (points to string table)
Source file and line number information

How the profiler uses it

When the profiler needs to symbolize address 0xf0318:

1. Load gopclntab: The profiler reads the .gopclntab section from the demo binary

(Code: nativeunwind/elfunwindinfo/elfgopclntab.go:388 - NewGopclntab())

2. Binary search: Find which function contains 0xf0318 by searching the sorted function table

Searches entries until it finds: start=0xf0310, end=0xf0370, name="main.processRequest"

3. Return result: The profiler now knows 0xf0318 is inside main.processRequest

Fallback strategy

If gopclntab is somehow missing or corrupted (extremely rare), the profiler falls back to standard ELF symbol tables (.symtab, .dynsym). But in practice, every Go binary has a valid gopclntab.

Step 4: Cache the result

Once resolved, the profiler caches 0xf0310 → main.processRequest. If the next stack sample hits the same address, it returns instantly without re-parsing the binary. Unlike DWARF debug info (which is compressed and expensive to decode), gopclntab is uncompressed and memory-mapped. This makes Go symbolization particularly fast—the profiler can parse gopclntab once at process startup, then perform microsecond lookups for every subsequent address.

The frame cache (processmanager/manager.go:75-79) stores the resolved frames with an LRU eviction policy, keeping hot functions instantly accessible.

Performance and optimizations

Symbolization must be fast. With profilers sampling at 20-100 Hz across potentially hundreds of processes, the profiler might need to resolve thousands of addresses per second. At that scale, even small inefficiencies compound into significant overhead.

The speed requirements

Consider a modest setup: 50 processes, 20 samples/second, 20 stack frames per sample. That's 20,000 address lookups per second. If each lookup takes 1 millisecond (linear scan), the profiler would consume an entire CPU core just for symbolization, which is unacceptable overhead. The profiler's target: under 1% CPU overhead, requiring lookups in the microsecond range.

Binary search: O(log n) lookups

The profiler needs to solve the reverse lookup problem: given an address, find the symbol name. Since gopclntab stores functions as address ranges (each function spans multiple addresses), the profiler moves through the following phases:

1. Extraction phase (once per binary):

Parses gopclntab to extract all functions
Each entry contains: start address, function name, source file info
Functions are naturally sorted by address in gopclntab

2. Lookup phase (for each stack address):

Uses binary search to find which range contains the address
Example: address 0xf0318 → binary search → found in range starting at 0xf0310→ returns "main.processRequest"

Complexity: O(log n) where n is the number of functions. With 4,000 functions (like our demo), this means ~12 comparisons per lookup instead of 4,000 linear scans.

Code reference: nativeunwind/elfunwindinfo/elfgopclntab.go:544-556 uses Go’s sort.Search

Frame caching

Once a frame is symbolized, the profiler caches the complete result—not just the function name, but the entire resolved frame including source file and line number information.

The frame cache (processmanager/manager.go:345-355) uses an LRU eviction policy.

Configuration:

Cache size: 16,384 entries
TTL: 5 minutes per entry
Refreshed on each hit to keep hot paths cached

Since gopclntab is memory-mapped and uncompressed, even cache misses are fast (microseconds). The cache primarily avoids repeated parsing of the same addresses across multiple stack samples.

Real performance

With these optimizations, the OpenTelemetry eBPF profiler achieves:

Sub-microsecond symbol lookups (cached)
~100 microseconds for cache misses (disk read + parse)
< 1% CPU overhead in production

This makes continuous profiling practical—you can run it 24/7 without noticing the performance impact.

When symbolization fails

Now that you know where symbols live, what happens when they're missing or incomplete?

Missing functions despite having symbols

If nm doesn't show a function you know exists, the compiler likely inlined it—merged the function into its caller for optimization. This is common with small, frequently called functions.

For Go, prevent inlining during development:

go build -gcflags="all=-N -l" -o app main.go

The -N disables optimizations and -l disables inlining. Don't use this for production—the performance cost is significant.

CGO and C libraries

For pure Go programs, symbolization "just works" and all your dependencies compile into a single binary with gopclntab covering everything. But if your Go program uses CGO to call C libraries, those portions behave differently:

Pure Go dependencies compile into your binary with gopclntab, so all function calls are symbolized—whether it's your code or third-party Go packages.
For CGO/C libraries, functions may appear as hex addresses if the libraries are stripped. gopclntab only covers Go code, not linked C binaries

In practice:

If you see hex addresses in a Go program's profile, check for CGO usage
The Go portions always symbolize correctly
C library calls might show as addresses unless the shared libraries have debug symbols

Quick diagnostic commands

These four commands quickly tell you what symbol information is available before you start profiling.

file your-app # Stripped or not?
nm your-app | wc -l # How many symbols?
readelf -S your-app | grep gopclntab # Go binary check
readelf -S your-app | grep debug # Has debug info?

Wrapping up

The next time you open Pyroscope and see function names in a flame graph for your Go application, you'll know the sophisticated machinery that made them appear. That main.processRequest you're investigating? It started as raw address 0x00000000000f0310, was captured by eBPF from a running process the profiler couldn't modify, was then looked up in gopclntab using binary search, and emerged as a readable name—all in microseconds, with minimal overhead.

Go's design makes this remarkably reliable. While other native languages lose all symbol information when stripped, Go's gopclntab survives—the runtime needs it for panic traces, so it's always present. This single design decision means you can strip Go binaries to save 30% space in production while maintaining perfect symbolization. No separate debug files, no symbol servers, and no trade-offs.

The OpenTelemetry eBPF profiler leverages this by parsing gopclntab directly, providing consistent symbolization whether your binary is fresh from development or stripped for production. This is why Go programs are particularly well-suited for continuous profiling—you get full observability without sacrificing binary size or runtime performance.

Symbolization is the invisible foundation of modern observability. Without it, profiling data would be nearly useless—just hexadecimal addresses with no meaning. To learn more, you can check out the OTel eBPF profiler on GitHub and our Pyroscope eBPF setup docs.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Chris Watts — Tue, 24 Mar 2026 00:20:56

Chris Watts is Head of Enterprise Engineering at OpenRouter, building infrastructure for AI applications. Previously at Amazon and a startup founder.

As large language models become core infrastructure for more and more applications, teams are discovering a familiar challenge in a new context: you can't improve what you can't see. Whether you're routing requests across multiple AI providers, managing costs across dozens of models, or debugging why a particular prompt is timing out in production, observability is no longer optional for LLM-powered systems.

At OpenRouter, we provide a unified API that gives developers access to hundreds of models from providers like OpenAI, Anthropic, Google, and Meta through a single integration. We handle load balancing, provider fallbacks, and model routing so teams can focus on building their applications rather than managing multiple API keys and billing accounts.

But access to models is only half the story. When you're running AI workloads in production, you need to understand how those workloads are performing, what they're costing, and where they're failing. That's why we built Broadcast, a feature that automatically sends traces from your OpenRouter requests to observability platforms like Grafana Cloud, with no additional instrumentation required in your application code.

In this post, we'll walk through how Broadcast works with Grafana Cloud, and share some of the real-world use cases we're seeing.

Why LLM observability is different

Traditional application monitoring focuses on familiar signals: HTTP status codes, response times, and error rates. LLM applications use those same signals, but they also introduce entirely new dimensions that teams need to track:

Token usage and costs: Every request consumes tokens, and costs vary across models. A single prompt sent to GPT-4o vs. Claude 3.5 Haiku can differ dramatically.
Model behavior variability: The same prompt can produce different results depending on which model or provider handles it. When you're using fallbacks or load balancing across providers, understanding which model actually served a request matters.
Latency profiles: LLM latency isn't just about total response time. Time to first token, tokens per second, and total generation time each tell a different part of the story.
Non-deterministic failures: LLM requests can fail in subtle ways, like hitting rate limits, receiving truncated outputs, or producing responses that technically succeed but don't meet quality expectations.

Most teams start by adding logging and metrics to their own application code, but this approach quickly becomes difficult to maintain, especially when you're using multiple models and providers. What you really want is observability that's built into the infrastructure layer, where the routing and model selection actually happen.

How OpenRouter Broadcast works with Grafana Cloud

OpenRouter Broadcast works by automatically generating OpenTelemetry traces for every API request and sending them to your configured destinations. There's no SDK to install, no code to change, and no additional latency added to your requests. You configure it once in your OpenRouter dashboard, and every request flowing through your account is traced.

For Grafana Cloud, traces are sent via the standard OTLP HTTP/JSON endpoint directly to Grafana Cloud Traces, the cloud-based tracing backend powered by Tempo OSS. Each trace includes rich attributes following OpenTelemetry semantic conventions for generative AI:

Model information: Which model was requested, which model actually served the response, and which provider handled it
Token usage: Input tokens, output tokens, and total tokens consumed
Timing data: Total request duration, time to first token, and generation speed
Cost data: The cost in USD for each request
Status and errors: Whether the request succeeded, why generation ended, and any error details
Custom metadata: Any application-specific context you attach to your requests, like user IDs, session IDs, or feature flags

Once traces are flowing into Grafana Cloud, you can query them using TraceQL, build dashboards, and set up alerts, all using the same Grafana Cloud interface your team already knows.

You can see span rate, error rates, and duration for OpenRouter traces at a glance:

You can drill into into a single LLM Generation trace to inspect timing and service details:

Full span attributes show the prompt, model, token count, and completion, all captured via OpenTelemetry:

Real-world use cases

Here are some of the ways teams are using OpenRouter Broadcast with Grafana Cloud today.

Tracking costs across models and features

One of the most immediate use cases is cost visibility. When you're routing requests across multiple models, it's easy to lose track of where your spend is going. With traces flowing into Grafana Cloud, teams build dashboards that break down costs by model, API key, user, or any custom metadata they attach to their requests.

For example, a team running both a customer-facing chatbot and an internal document processing pipeline can use separate API keys or custom metadata to attribute costs to each workload. A simple TraceQL query like this surfaces all requests from a specific environment:

{ resource.service.name = "openrouter" && span.trace.metadata.environment = "production" }

This kind of visibility lets engineering leads and finance teams answer questions like "How much did our AI features cost last week?" or "Which model is giving us the best cost-per-quality ratio?" without building custom analytics infrastructure.

Monitoring latency and performance

LLM latency directly impacts user experience. A chatbot that takes 8 seconds to start responding feels broken, even if the final output is excellent. With OpenRouter traces in Grafana Cloud, teams can monitor latency trends over time, set alerts for slow requests, and compare performance across models.

TraceQL makes it easy to find outliers:

{ resource.service.name = "openrouter" && duration > 5s }

Teams often build Grafana dashboards that show p50, p95, and p99 latency by model, which helps them make informed decisions about which models to use for latency-sensitive vs. batch workloads.

Debugging errors and failed requests

When something goes wrong in an LLM pipeline, the cause isn't always obvious. Was it a rate limit? A poorly created prompt? A provider outage? With distributed traces in Grafana Cloud, teams can quickly filter for errors and drill into individual requests to see exactly what happened:

{ resource.service.name = "openrouter" && status = error }

Each trace includes the model, provider, error details, and timing information, giving teams the context they need to diagnose issues without digging through application logs.

Usage analytics and capacity planning

As AI features grow, teams need to understand usage patterns to plan capacity and negotiate contracts with providers. Grafana Cloud dashboards built on OpenRouter traces can show request volume over time, token consumption trends, and model popularity, all without any additional instrumentation.

Teams use this data to track how usage is growing and answer questions like: "Are we approaching our rate limits?" or "Should we shift more traffic to a cheaper model for this use case?"

Getting started

Setting up the integration takes just a few minutes:

1. Get your Grafana Cloud credentials: You'll need your OTLP gateway endpoint, instance ID, and an API token with traces:write permissions from your Grafana Cloud portal.

2. Enable Broadcast in OpenRouter: Navigate to Settings > Observability in your OpenRouter dashboard and toggle Broadcast on.

3. Configure Grafana Cloud as a destination: Enter your Grafana Cloud credentials and click Test Connection to verify the setup.

4. Start querying traces: Once configured, every OpenRouter request will generate a trace in Grafana Cloud. Navigate to Explore, select your Tempo data source, and run { resource.service.name = "openrouter" } to see your traces.

For detailed setup instructions, including how to find your OTLP endpoint and create API tokens, check out our Broadcast to Grafana Cloud documentation.

Adding custom metadata

To get the most out of the integration, we recommend attaching custom metadata to your OpenRouter requests. This metadata flows through to Grafana Cloud as span attributes, making it easy to filter and group traces by your own application context:

{
"model": "openai/gpt-4o",
"messages": [{ "role": "user", "content": "Summarize this document..." }],
"user": "user_12345",
"session_id": "session_abc",
"trace": {
"trace_name": "Document Summary",
"environment": "production",
"feature": "summarization"
}
}

You can then query these attributes in TraceQL:

{ resource.service.name = "openrouter" && span.trace.metadata.feature = "summarization" }

Privacy controls

For teams working with sensitive data, Broadcast supports a Privacy Mode that excludes prompt and completion content from traces while still sending all operational data like token usage, costs, timing, and model information. This lets you get full observability without exposing the content of your LLM interactions.

What's next

We're continuing to invest in making LLM observability as seamless as possible. We're adding new integrations regularly and are working on richer trace data, including more granular timing breakdowns and quality signals that can help you build even more comprehensive observability dashboards.

If you're building with LLMs and want visibility into how your AI workloads are performing, give the OpenRouter and Grafana Cloud integration a try. You can get started with a free Grafana Cloud account and an OpenRouter account in minutes.

To learn more about OpenRouter's Broadcast feature and all supported destinations, visit the Broadcast documentation. For questions or feedback, reach out to us at openrouter.ai.

Instrument zero‑code observability for LLMs and agents on Kubernetes

Ishan Jain — Fri, 20 Mar 2026 17:20:53

Note: The world is changing all around us thanks to AI. Today, anyone and everyone can be a developer, using LLMs to create LLM-powered applications, which users can then interact with by using even more LLMs.

Observability practitioners need to adapt and they need the right tools for the job. In this series, we'll show you how to use Grafana Cloud to monitor AI applications, including workloads in production, AI agents, MCP servers, and zero-code LLMs (this post).

Building AI services with large language models and agentic frameworks often means running complex microservices on Kubernetes. Observability is vital, but instrumenting every pod in a distributed system can quickly become a maintenance nightmare.

OpenLIT Operator solves this problem by automatically injecting OpenTelemetry instrumentation into your AI workloads—no code changes or image rebuilds required. When combined with AI Observability in Grafana Cloud, you can monitor costs, latency, token usage, and agent workflows across your entire cluster in minutes.

In this final post in our AI Observability series, we'll show you how to easily get started by combining OpenLIT Operator and Grafana Cloud to enable zero-code observability for your AI workloads.

Why zero‑code instrumentation matters

Traditional observability relies on developers adding instrumentation libraries to their application code. But in the fast‑moving world of generative AI, your stack might include multiple model providers, agent frameworks, vector databases, and custom tools. Keeping instrumentation up to date across all these components is a burden.

The OpenLIT Operator brings zero‑code AI observability to Kubernetes. It automatically injects and configures OpenTelemetry instrumentation into your pods, producing distributed traces and metrics without any code changes. Because it is built on OpenTelemetry standards, it integrates with existing observability infrastructure and allows you to switch between providers (OpenLIT, OpenInference, OpenLLMetry, custom) without redeploying your applications.

This zero‑code approach is designed specifically for AI workloads. It provides seamless observability for LLMs, vector databases, and agentic frameworks running in Kubernetes. You can track token usage, monitor agent workflows, measure response times, and debug AI framework interactions—all without touching your code.

Benefits of zero‑code observability in Grafana Cloud

There are also multiple reasons why you should use zero-code observability in Grafana Cloud.

Rapid onboarding: Deploy the OpenLIT Operator once and instrument all your AI workloads without modifying a single line of code.
Comprehensive coverage: The operator supports major LLM providers, vector databases, and agent frameworks, and can be extended to other providers through its plugin architecture.
Vendor neutrality: Built on OpenTelemetry, the operator allows you to send telemetry to Grafana Cloud, a self‑hosted OpenTelemetry collector, or any OTLP‑compatible backend.
Cost and performance insights: Distributed traces capture token usage, cost, latency, and agent step sequences, enabling you to optimise model selection and resource allocation.

How to set up zero-code observability for AI applications in Grafana Cloud

Now that we've covered why you should be using Grafana Cloud for zero-code observability, let's look at how you can make that happen, starting with a high-level explanation of the workflow, followed by step-by-step instructions for getting started quickly.

And if you get stuck anywhere along the way or need help with your own setup, click on the pulsar icon in the top-right corner of the Grafana Cloud UI to open a chat with Grafana Assistant, our purpose-built LLM that can help troubleshoot incidents, manage dashboards, and answer product questions.

Architecture overview

AI applications like LLMs and agents run inside pods in your Kubernetes cluster. The OpenLIT Operator continuously monitors these pods and checks them against your instrumentation policies. When it finds a matching pod, it automatically injects an init container that sets up OpenTelemetry instrumentation, enabling observability without requiring manual changes to your application code.

Telemetry is sent to an OpenLIT collector or directly to Grafana Cloud’s OpenTelemetry Protocol (OTLP) gateway. The AI Observability dashboards in Grafana Cloud then visualize latency, cost, and quality metrics.

The workflow consists of four key pieces:

AI workloads: Pods running LLMs, vector DBs, or agent frameworks such as LangChain, CrewAI, or OpenAI Agents. The operator supports a wide range of LLM providers (OpenAI, Anthropic, Google, AWS Bedrock, Mistral) and frameworks (LangChain, LlamaIndex, CrewAI, Haystack, DSPy, and more).
OpenLIT Operator: A Kubernetes operator that injects OpenTelemetry instrumentation into selected pods based on label selectors. The operator is OpenTelemetry‑native and allows you to switch providers without changing your application code.
OpenLIT collector: Collects traces and metrics from instrumented pods. You can run it in‑cluster via Helm or send telemetry directly to Grafana Cloud’s OTLP endpoint.
Grafana Cloud: Stores traces in Tempo and metrics in Prometheus through our fully managed platform. Our AI observability solution provides pre‑built dashboards for GenAI, vector DBs, agents, and Model Context Protocol (MCP), allowing you to explore latency percentiles, token and cost metrics, agent step sequences, and evaluation results.

Step 1: Add the AI Observability integration

Before instrumenting your cluster, add the AI Observability integration to your Grafana Cloud stack. This can be done by clicking on Connections in the left-side menu and following the steps outlined in our documentation.

This provisions dashboards and sets up a managed OTLP gateway for receiving your traces and metrics. Once telemetry arrives, the dashboards populate automatically with request rates, latency distributions, and cost summaries.

Step 2: Prepare your Kubernetes environment

To follow this guide, you’ll need a Kubernetes cluster with cluster‑admin privileges, Helm, and kubectl configured. If you don’t have a cluster, you can create one locally using k3d or minikube. For a quick test drive, create a cluster with:

k3d cluster create openlit-demo

Step 3: Deploy OpenLIT Operator

First add the OpenLIT Helm repository and update your charts:

helm repo add openlit https://openlit.github.io/helm/
helm repo update

Install the OpenLIT Operator to enable zero‑code instrumentation:

helm install openlit-operator openlit/openlit-operator

Verify that the operator pod is running:

kubectl get pods -n openlit -l app.kubernetes.io/name=openlit-operator

You should see the operator in a Running state.

Step 4: Create an AutoInstrumentation resource

The AutoInstrumentation custom resource defines which pods to instrument and how to configure the injected instrumentation. It specifies label selectors to target your AI applications, the instrumentation provider (OpenLIT by default), and the OTLP endpoint to send telemetry.

Here is a minimal example that instruments pods labeled instrumentation=openlit and sends data to Grafana Cloud:

apiVersion: openlit.io/v1alpha1
kind: AutoInstrumentation
metadata:
name: grafana-observability
namespace: default
spec:
selector:
matchLabels:
instrumentation: openlit
python:
instrumentation:
enabled: true
otlp:
endpoint: "https://otlp-gateway-.grafana.net/otlp" # Grafana OTLP gateway
headers:
Authorization: "Basic " # Replace with base64‑encoded instanceID:token
resource:
attributes:
deployment.environment: "production"
service.namespace: "ai-services"

Apply the manifest:

kubectl apply -f autoinstrumentation.yaml

Already have AI applications running? Restart the pods that match your selector to pick up the injected instrumentation:

kubectl rollout restart deployment your-deployment-name

When the pods restart, the OpenLIT Operator automatically injects an init container that configures Python instrumentation. The pods begin emitting distributed traces with LLM costs, token usage, and agent performance metrics.

Step 5: Deploy your AI application (no code changes)

You can now deploy or continue running your AI workloads normally. Whether you’re using OpenAI Agents SDK, CrewAI, LangChain, or a custom Python service, you don’t need to modify your code. The operator recognizes supported frameworks and model providers, and it instruments them transparently.

For example, a simple deployment of a CrewAI‑based chatbot can be launched via a Kubernetes Deployment; the operator will detect and instrument all LLM and agent calls as soon as the pod starts. The instrumentation captures the sequence of agent steps, tool invocations, and model responses, along with latency and token metrics.

Step 6: Visualize metrics and traces in Grafana

With your pods instrumented and telemetry flowing to Grafana Cloud, open the AI Observability dashboards.

The GenAI observability dashboard shows request rates, p95/p99 latencies, and cost metrics across different providers. The GenAI observability dashboard surfaces agent workflows, step durations, and tool success rates. The Vector DB and MCP dashboards provide context on database queries and protocol health.

Because OpenLIT’s traces include LLM costs and token counts, Grafana can also estimate costs and highlight expensive calls. In the dashboard, you’ll see a service overview, individual traces for HTTP requests and OpenAI API calls, detailed spans with token usage, performance metrics (response times, error rates, throughput), and cost tracking.

Grafana’s alerting engine can trigger notifications when latency spikes, error rates increase, or token usage exceeds budget. Since the telemetry is OpenTelemetry‑native, you can build custom panels and alerts on top of Prometheus metrics and Tempo traces.

Next steps

You can also learn more about Grafana Cloud AI Observability in the official docs, including setup instructions and dashboards. You can also check out the first post in this series to see a full demo to better understand how to monitor AI workloads in Grafana Cloud, or check out our other AI blogs, including posts about our own LLM: Grafana Assistant. of setting up a or check out our other blog posts

Taken collectively, these resources will help you move from a basic demo to a production-ready setup for your AI applications.

Observe your AI agents: End‑to‑end tracing with OpenLIT and Grafana Cloud

Ishan Jain — Fri, 20 Mar 2026 17:20:49

In another post in this series, we discussed how to instrument large language model (LLM) calls. This can be a good starting point, but generative AI workloads increasingly rely on agents, which are systems that plan, call tools, reason, and act autonomously.

And their non‑deterministic behavior makes incidents harder to diagnose, in part, because the same prompt can trigger different tool sequences and costs.

AI agents combine LLM reasoning with external tools and dynamic workflows, and observability data must serve as a feedback loop for continuous improvement. Without proper tracing, you end up guessing why an agent took a particular path.

In this guide, you'll learn how to use the OpenLIT SDK to capture agent‑level telemetry and how to use Grafana Cloud to visualize every step.

Why observability matters for agents

Traditional APM covers infrastructure metrics and latency, but that's not enough to get a holistic view of your agents. AI Observability in Grafana Cloud uses the OpenLIT SDK to automatically generate distributed traces and metrics to provide insights into each agentic event.

AI Observability provides five prebuilt dashboards that analyze response times, error rates, throughput, token usage, and costs across your AI stack. Beyond raw metrics, OpenLIT captures agent names, actions, tool calls, token usage, and errors. This enables:

Full sequence visibility: Follow a request from the user query through planning, tool invocations, LLM calls, and final responses. Each span in the trace shows the prompt, selected tool, and reasoning chain.
Cost and token tracking: For each step, you see token counts and API costs, so you can optimize tool choices and model selection.
Behavioral troubleshooting: Agent traces reveal reasoning paths and tool usage. If the agent produces an incorrect answer, you can reconstruct the chain to find where it went wrong.
Unified dashboards and alerting: Grafana Cloud combines fully managed versions of Prometheus, Tempo, and Loki to present metrics, traces, and logs in one place, with optional alerts on cost thresholds or latency spikes.

Benefits of agent observability in Grafana Cloud

Agent observability is more than just infrastructure monitoring. With OpenLIT and Grafana Cloud, you gain:

Predictable costs: Identify which agent step or tool call accounts for most of your spending and reroute simple queries to cheaper models.
Performance optimization: Detect latency spikes at specific stages (e.g., search API vs. LLM) and adjust concurrency or caching accordingly.
Quality assurance: Traces can be replayed to understand reasoning mistakes, while integrated evaluation tools in OpenLIT (such as hallucination detection and toxicity analysis) provide safety metrics.
Faster debugging: When an agent fails, you have a single trace that links user input, internal reasoning, external calls, and the error, making root‑cause analysis straightforward.
Future‑proof instrumentation: OpenTelemetry semantic conventions for AI agents are evolving; by using OpenLIT, you adopt these standards and avoid vendor lock‑in. Grafana Cloud’s integration ensures your telemetry remains compatible as conventions mature.

How to monitor your AI agents with Grafana Cloud

Now that you understand some of the nuances of observing AI agents, let's show you how you can use prebuilt capabilities in Grafana Cloud to start collecting and visualizing telemetry from your agents.

Architecture overview

AI agents orchestrate multiple actions: planning, calling external tools or models, and producing a response. OpenLIT instruments each of these steps and emits OpenTelemetry spans and metrics. You can send this data directly to Grafana Cloud or via an OpenTelemetry Collector. The following diagram shows how a user request flows through an agent orchestrator and is monitored:

The workflow consists of four key pieces:

User query: A customer sends a message to your agent.
Agent orchestrator: Frameworks like CrewAI or the OpenAI Agents SDK break the task into sequential steps: plan, call a tool (e.g., a search API), call an LLM, and generate a result.
OpenLIT instrumentation: A single openlit.init() call instruments the entire agent pipeline. Each planning step, tool call, and model completion is captured as an OpenTelemetry span.
Grafana Cloud: Metrics and traces flow into Grafana Cloud’s managed Prometheus and Tempo backends, where pre‑built AI dashboards visualize performance and costs.

Step 1: Install the AI Observability integration

Start by adding AI Observability to your Grafana Cloud stack. This can be done by clicking on Connections in the left-side menu and following the steps outlined in our documentation.

This installs the five dashboards mentioned earlier (GenAI observability, GenAI evaluations, vector DB observability, MCP observability, and GPU monitoring). When metrics arrive, these dashboards automatically populate with latency histograms, token counts, cost summaries, and evaluation results.

Step 2: Install OpenLIT

OpenLIT is an OpenTelemetry‑native SDK for instrumenting GenAI workloads. Install it alongside your agent framework:

pip install openlit crewai

OpenLIT supports dozens of frameworks, including CrewAI, OpenAI Agents, LangChain, AutoGen, and others. The SDK automatically instruments supported libraries; no manual span creation is required.

Step 3: Instrument your agent

OpenLIT can be added with a single line of code. Below is an example that uses CrewAI to build a simple agent with two tools: a search tool and a summarizer.

The agent plans its steps, uses the search tool to fetch content, and then summarizes the result. OpenLIT records each step, tool call, and model completion. You can swap CrewAI with the OpenAI Agents SDK—the instrumentation code remains the same.

import os
import openlit # Instruments all supported frameworks when initialised
from crewai import Agent, Task, Crew, agent_tools # CrewAI framework
from your_search_module import SearchTool # hypothetical search tool
from your_summarise_module import SummariseTool # hypothetical summariser
openlit.init() # one line to enable OpenTelemetry tracing and metrics
# Define tools the agent can use
search_tool = SearchTool()
summarise_tool = SummariseTool()
# Compose an agent with a planning function and tool access
assistant = Agent(
name="research_assistant",
role="Find relevant sources and summarise them",
tools=[search_tool, summarise_tool],
planning=True # enable internal reasoning and tool selection
)
# Define a task for the agent
task = Task(
description="Provide a concise summary of the latest developments in battery recycling.",
expected_output="A two‑paragraph summary highlighting key advances",
)
# Create a crew to execute the task
crew = Crew(
agents=[assistant],
tasks=[task],
verbose=True,
)
if __name__ == "__main__":
result = crew.execute()
print(result)

When this script runs, OpenLIT automatically captures:

LLM prompts and completions: The prompts sent to the LLM and the responses returned
Token usage and costs: Counts the tokens for each call and estimates API cost
Agent names and actions: Identifies which agent or sub‑agent executed each step
Tool usage: Records which tool was invoked and its parameters
Errors: Surfaces exceptions such as API failures or tool errors

This information becomes distributed spans in Tempo and metrics in Prometheus. If you use the OpenAI Agents SDK, the pattern is the same: Call openlit.init() before constructing your agent, and every agent step will emit telemetry.

Step 4: Forward telemetry to Grafana Cloud

To send traces and metrics directly to Grafana Cloud, set the following environment variables before running your agent. Replace the values with your own service name, environment, and Grafana credentials:

# identify your service and environment
export OTEL_SERVICE_NAME="agent-demo"
export OTEL_DEPLOYMENT_ENVIRONMENT="production"
# Grafana Cloud OTLP endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp-gateway-.grafana.net/otlp"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic "
# Set any API keys for your agent framework
export OPENAI_API_KEY="sk-..."
python agent_service.py

Step 5: Explore your agent traces and metrics

With your agent running, open Grafana Cloud. Navigate to AI Observability and select the AI Agents dashboard. Here you can:

View complete traces: Each user request produces a trace containing spans for planning, tool invocations, LLM calls, and response generation. The traces page in OpenLIT provides detailed span analysis and execution flow, and Grafana Cloud mirrors this via Tempo.
Monitor metrics and costs: Custom dashboards can display throughput, latency, token usage, and cost metrics stored in Prometheus.
Filter and investigate errors: The errors page surfaces traces with exceptions and allows filtering by time range or exception type.
Correlate with infrastructure: Grafana Cloud unifies metrics, logs, and traces, so you can correlate an agent’s slow step with CPU spikes or external API rate limits.

Grafana Cloud’s AI dashboards are purpose-built for GenAI applications and include separate panels for LLM performance, agent performance, vector database operations, and GPU health. Because OpenLIT uses OpenTelemetry standards, you can extend these dashboards or forward the data to other observability tools if required.

Next steps

Want to go further? In the next blog in this series, we’ll show, step by step, how to enable this step by step for an MCP client.

You can also learn more about Grafana Cloud AI Observability in the official docs, including setup instructions and dashboards. These resources will help you move from a basic demo to a production-ready setup for your AI applications.

Monitor Model Context Protocol (MCP) servers with OpenLIT and Grafana Cloud

Ishan Jain — Fri, 20 Mar 2026 17:20:37

Large language models don’t work in a vacuum. They often rely on Model Context Protocol (MCP) servers to fetch additional context from external tools or data sources.

MCP provides a standard way for AI agents to talk to tool servers, but this extra layer introduces complexity. Without visibility, an MCP server becomes a black box: you send a request and hope a tool answers. When something breaks, it’s hard to tell if the agent, the server or the downstream API failed.

In this guide, you'll learn how to instrument MCP servers using OpenLIT and how to analyze those servers in Grafana Cloud.

Why MCP observability matters

In an agentic system, an MCP server may route tool calls to multiple services. Observability helps you answer critical questions about:

Latency spikes: When a tool is slow to respond, user experience suffers. By examining request throughput and the 95th/99th percentile latency distributions, you can determine whether a downstream API or the MCP layer is responsible.
Silent failures: For example, a tool returning partial data or timing out often goes unnoticed without structured telemetry. End‑to‑end tracing across the agent, MCP server, and external tools provides the full context needed to diagnose these issues.
Cross‑service visibility: This is important because MCP calls cross-network and language boundaries. For example, OpenTelemetry propagates context, so spans started in a Python client link seamlessly to spans in a Node.js tool server, producing a coherent trace across systems.
Context window usage: Resource consumption can grow quickly as agents query more tools. By tracking context window usage and memory consumption, you can right‑size your MCP servers and avoid over‑allocating resources.

AI Observability in Grafana Cloud supports MCP out of the box. The solution includes pre‑built dashboards for tool performance, protocol health, resource usage, and error tracking.

Benefits of MCP observability in Grafana Cloud

Observing your MCP servers unlocks a range of advantages:

End‑to‑end tracing shows the entire path of a request—from the agent through the MCP server to each tool call—so you can pinpoint bottlenecks and failures.
Detailed performance metrics like tool_invocation_duration_ms and invocation counts, help you identify slow or overused tools and adjust resource allocation accordingly.
Scalability and cost control are enabled through context‑window and memory usage telemetry, so you can right‑size servers and avoid over‑provisioning. Because OpenTelemetry uses an open, vendor‑neutral format, your instrumentation remains portable; you can route data to Grafana, a self‑hosted OTLP stack or any other backend without code changes.
Security and compliance is also available with MCP monitoring by auditing tool interactions and ensuring protocols are used as intended.

How to monitor your MCP server with Grafana Cloud

Next, let's take a high-level look at how you can use Grafana Cloud to observe your MCP server, then we'll walk through the setup process so you can get up and running today.

Architecture overview

The diagram below illustrates how agent interactions with an MCP server are instrumented and visualized.

The agent or client calls the MCP server to execute tools. OpenLIT instruments both the client and the server, capturing spans for context management, tool selection, and tool execution. These traces and metrics are exported to Grafana Cloud, where pre‑built dashboards provide insight into performance and failures.

The workflow consists of five key components:

Agent/client: AI agents use the MCP protocol to invoke tools hosted on external servers.
MCP server: Hosts one or more tools (e.g., search, database query). The server handles context loading, manages tool state, and responds to requests.
External tools: Actual services (databases, APIs) that do the work. They may be local or remote.
OpenLIT instrumentation: A single openlit.init() call instruments on both the client and server; context interactions, and tool executions generate OpenTelemetry spans.
Grafana Cloud: Collected traces and metrics flow into Grafana Cloud’s fully managed Prometheus and Tempo backends, where specialized MCP dashboards offer visibility into protocol usage.

Step 1: Install the AI Observability solution

Start by adding the AI Observability integration to your Grafana Cloud stack. This can be done by clicking on Connections in the left-side menu and following the remaining steps outlined in our documentation.

This provision pre‑built dashboards, including one for MCP observability, and configured a managed OpenTelemetry Protocol (OTLP) gateway to receive traces and metrics. Once telemetry flows in, the dashboards automatically populate with call rates, latency percentiles, and error counts.

Step 2: Install OpenLIT and the MCP library

OpenLIT provides auto‑instrumentation for MCP alongside LLMs, vector stores, and agent frameworks. Install OpenLIT and the mcp library (which implements the client and server) via pip:

pip install openlit mcp

After installation, a single call to openlit.init() automatically instruments all MCP operations. If you choose to run your own telemetry collector instead of Grafana’s OTLP gateway, OpenLIT can be self‑hosted via Docker Compose or deployed to Kubernetes using the OpenLIT Operator.

Step 3: Instrument your MCP application

Instrumentation requires just two lines of code. Below is a simple example of an MCP server that exposes a search_documents tool. OpenLIT instruments the server, capturing each tool invocation and context interaction:

import openlit
from mcp import Server
openlit.init() # enable OpenTelemetry tracing and metrics
# Create an MCP server instance
server = Server()
# Define a tool to fetch documents
@server.tool("search_documents")
def search_documents(query: str):
# Imagine this function calls a search API or database
results = document_search(query)
return results
# Run the MCP server on localhost
server.run(host="localhost", port=8080)
# When a client invokes search_documents, OpenLIT captures:
# * Context protocol interactions (e.g., context loading and management)
# * Tool usage metrics (latency and success rate)
# * Protocol handshake performance
# * Resource usage (context window size, memory)
# * Errors or exceptions:contentReference[oaicite:14]{index=14}

To instrument an MCP client, use the Client class from the mcp library and call openlit.init() before making requests:

import openlit
from mcp import Client
openlit.init()
client = Client("http://localhost:8080")
tools = client.list_tools() # Lists available tools
result = client.call_tool(
"search_documents",
{"query": "AI observability"}
) # Invokes the tool
# All client operations are automatically instrumented:contentReference[oaicite:15]{index=15}

OpenLIT supports zero‑code instrumentation via a CLI wrapper. To instrument an existing MCP service without code modifications, use:

openlit-instrument \
--service-name my-mcp-app \
python your_mcp_app.py
# With custom settings:
openlit-instrument \
--otlp-endpoint http://127.0.0.1:4318 \
--service-name my-mcp-app \
--environment production \
python your_mcp_app.py

Step 4: Configure environment variables

To send your traces and metrics to Grafana Cloud, you need OpenTelemetry credentials. Generate them from the Grafana Cloud portal and set the following variables in your environment:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp-gateway-.grafana.net/otlp"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic "
export OTEL_SERVICE_NAME="mcp-server-demo"
export OTEL_DEPLOYMENT_ENVIRONMENT="production"

When you run your client or server with these variables, the OpenLIT SDK automatically sends telemetry data to Grafana Cloud.

Step 5: Explore the MCP observability dashboard

After you start your instrumented MCP client and server, open Grafana Cloud and navigate to AI Observability → MCP Observability. The dashboard provides:

Tool performance: Call latency histograms, success rates, and invocation counts per tool.
Protocol health: Session stability and connection metrics to detect handshake issues.
Resource usage: Context window size, memory, and data access patterns, helping you optimize server resources.
Error tracking: Lists failed operations with trace IDs and detailed exception information to aid debugging

You can build custom dashboards by querying Prometheus metrics (e.g., tool invocation duration) and Tempo traces. Because OpenLIT uses OpenTelemetry, you’re not locked into a single backend. You can forward telemetry to any OTLP‑compatible observability stack.

Next steps

Ready to learn more? In the final blog in this series, we’ll show how to set this up, step by step, for a zero-code instrumentation approach to AI Observability.