Menu

This is documentation for the next version of Grafana Tempo documentation. For the latest stable release, go to the latest version.

Open source

Version 2.9 release notes

The Tempo team is pleased to announce the release of Tempo 2.9.

This release gives you:

  • New MCP (Model Context Protocol) server support for LLM integration with tracing data
  • Enhanced TraceQL metrics with sampling capabilities for improved performance
  • Significant operational improvements for multi-tenant environments
  • Numerous TraceQL performance and correctness improvements

These release notes highlight the most important features and bug fixes. For a complete list of changes, refer to the Tempo CHANGELOG.

Access tracing data with the Tempo MCP server

Note

Tempo MCP server is an experimental feature. Engineering and on-call support is not available. Documentation is either limited or not provided outside of code comments. No SLA is provided. Enable the feature toggle in Grafana to use this feature. Do not enable this feature in production environments.

Tempo 2.9 includes an MCP (Model Context Protocol) server that provides AI assistants and Large Language Models (LLMs) with direct access to distributed tracing data through TraceQL queries and other endpoints. (PR 5212)

MCP is a widely adopted protocol, developed by Anthropic, that standardizes how applications provide context to large language models (LLMs). By integrating MCP with Tempo, you can now leverage LLM-powered tools like Claude Code or Cursor to analyze and derive value from your tracing data. This allows you to better understand interactions between your services and investigate and diagnose issues faster.

Warning

Using this feature will likely cause tracing data to be passed to an LLM or LLM provider. Consider the content of your tracing data and organizational policies when enabling the MCP server.

The feature is disabled by default and can be enabled per tenant for specific use cases.

Refer to MCP server for configuration details and examples. .

For more information, refer to LLM-powered insights into your tracing data: introducing MCP support in Grafana Cloud Traces.

TraceQL metrics sampling

TraceQL metrics sampling, using with(sample=true), dynamically and automatically chooses how to sample your tracing data to give you the highest quality signal with examining as little data as possible. (PR 5469)

This sampling method uses an adaptive probabilistic approach that responds to how common spans and traces matching the query are. This sampling is applied at the storage layer, for example, only inspecting xx% spans, or xx% traces, depending on the needs of the query.

When there is a lot of data, it lowers the sampling rate. When matches are rare it keeps the sampling rate higher, possibly never going below 100%. Therefore the performance gain depends on the query.

This behavior can be overridden to focus more on fixed span sampling using with(span_sample=0.xx) or fixed trace sampling using with(trace_sample=0.xx).(PR 5469)

Example:

traceql
{} | rate() with(sample=true)
{} | rate() with(span_sample=0.1)

To learn more, refer to TraceQL metrics sampling.

Adaptive sampling was featured in the September 2025 Tempo community call. Watch the recording starting at the 12:00 minute mark to learn more.

Features and enhancements

The most important features and enhancements in Tempo 2.9 are highlighted below.

Operational improvements

These improvements help operators running multi-tenant Tempo deployments improve trace quality and operational visibility.

SLO metrics improvements

Cached querier responses are now excluded from SLO metrics such as inspected bytes, providing more accurate measurements of actual storage and network I/O. This change ensures that performance metrics reflect real resource consumption rather than cached data access. (PR 5185)

Enhanced monitoring and observability

Several new metrics and histograms have been added to improve operational visibility:

  • Added counter query_frontend_bytes_inspected_total to track total bytes read from disk and object storage. (PR 5310, documentation)
  • Added histograms spans_distance_in_future_seconds / spans_distance_in_past_seconds that count spans with timestamps outside expected ranges. While spans in the future are accepted, they are invalid and may not be found using the Search API. (PR 4936, documentation)
  • Added support for scope in cost-attribution usage tracker, allowing more precise tracking with resource. or span. prefixed attributes. (PR 5646, documentation)
  • Logging and tracing in the write path now includes tenant information. The distributor.ConsumeTraces span status is now properly set to error when trace consumption fails. (PR 5436)

Ingress bytes monitoring

The new metric tempo_distributor_ingress_bytes_total measures bytes received before limits are applied, providing better visibility into incoming data volumes and helping with capacity planning. (PR 5601)

Metrics-generator improvements

The following improvements have been made to the metrics-generator:

  • The definition of tempo_metrics_generator_processor_service_graphs_expired_edges has been adjusted to exclude edges that are properly counted in the service graph. This provides more accurate metrics for monitoring service graph health and completeness. (PR 5319, documentation)

  • Invalid Prometheus label names are now automatically dropped in the span metrics processor, preventing metric ingestion errors and ensuring compatibility with Prometheus naming conventions. (PR 5122)

  • Added support for the new db.namespace attribute for service graph database nodes, aligning with newer OpenTelemetry semantic conventions. This provides better categorization and visualization of database interactions in service graphs. (PR 5602, documentation)

  • Service graph client virtual nodes now use peer attributes to determine client service names, providing more accurate representation of service-to-service communications in distributed systems. (PR 5381, documentation)

TraceQL correctness and performance improvements

TraceQL metrics have received several important updates that improve correctness and performance. Refer to Upgrade considerations for breaking changes that may affect your deployment.

  • Improved exemplar selection in quantile_over_time() function provides better representative samples (PR 5278)
  • Fixed issue preventing very small time steps in metrics queries (PR 5441)
  • Fixed incorrect TraceQL string comparison for strings starting with numbers (PR 5658)
  • Fixed incorrect results in TraceQL compare() function for spans with array attributes (PR 5519)
  • Fixed structural operator behavior with empty left-hand spansets (PR 5578)

Performance improvements

The following improvements have been made to TraceQL query performance:

  • Performance increase for basic TraceQL metrics queries through reduced overhead in batch processing (PR 5247)
  • General TraceQL search and metrics performance improvements (PR 5280)
  • TraceQL performance improvements for complex queries (PR 5218)
  • Optimized compare() function performance (PR 5419)
  • TraceQL attribute struct alignment for better memory performance (PR 5240)

Upgrade considerations

When upgrading to Tempo 2.9, be aware of these considerations and breaking changes.

RPM and DEB packages discontinued

Tempo no longer publishes RPM and DEB packages due to an internal change to the handling of signing keys. This can be restored if customers need these packages. (PR 5684)

Migrated Vulture and integration tests to OTLP exporter

In this release, we’ve migrated Tempo Vulture and Integration Tests from the deprecated Jaeger agent/exporter to the standard OTLP exporter. Vulture now pushes traces to the Tempo OTLP GRCP endpoint. (PR 5058)

Bucket calculation changes

TraceQL metrics buckets are now calculated based on data in the past instead of the future, which aligns behavior with Prometheus. This resolves issues with empty last buckets that were previously encountered. (PR 5366)

This change may cause differences in existing dashboards and alerts that rely on TraceQL metrics bucket calculations. Review your monitoring and alerting configurations after upgrading.

Series label handling improvements

Fixed incorrect TraceQL metrics results when series labels include both strings and integers with the same textual representation (such as "500" vs 500). The prom_labels field has been removed from TraceQL metrics responses as it was the source of these errors. (PR 5659)

There may be brief interruptions to TraceQL metrics queries during rollout while components run different versions. Plan for a coordinated rollout to minimize impact.

vParquet5 block format

Tempo 2.9 introduces a new experimental vParquet5 block format, designed to improve query performance and reduce storage requirements.vParquet5 has two previews: vParquet5-preview1, low-resolution timestamp columns, and vParquet5-preview2, dedicated integer columns.

Breaking changes are expected before the final release. (PR 5495, PR 5639)

Go version upgrade

Tempo 2.9 upgrades to Go 1.25.0, which may affect custom builds or deployments with specific Go version requirements. (PR 5548)

If you’re building Tempo from source or using custom Docker images, ensure your build environment supports Go 1.25.0.

Project Rhythm: New Tempo architecture

Project Rhythm is the codename for the new architecture project for Tempo. The main objective of this project is to address certain trade-offs of the current design that are limiting the ability of Tempo to grow and support new functionalities.

The goals of this project are to:

  • Eliminate the requirement for replication factor 3 (RF3) -> Support high availability and improve reliability of TraceQL metrics
  • Decouple the read and write path -> Scalability and reliability
  • Lay out the foundation to significantly reduce total cost of ownership (TCO) -> Room for growth

Project Rhythm has been discussed in the Tempo Community Calls for several months. Refer to the Tempo Community Calls on YouTube to learn more.

Refer to the Tempo Rearchitecture section in the Tempo 2.9 changelog for the list of pull requests that are part of this project.

Current problems

The current Tempo architecture has the following problems:

  • Lack of high-availability for reads in TraceQL metric queries: if a generator is down, the data which that generator holds becomes unavailable. This can be improved or mitigated in certain cases, such as rollouts by rolling updates, but never fully solved.
  • Low write reliability: to reduce the impact that the generators have on the read path, generator ingestion is asynchronous and doesn’t return errors to the client. While this is important to protect the ingester’s write path, it adds unreliability to generator’s ingestion.
  • High TCO: TraceQL metrics in the generators are based on flushing a new set of blocks of data, in addition to the ingester’s blocks. These two sets of blocks remain separated in compaction, making 2 copies the best case scenario in long-term storage—research shows we’re closer to 3-3.5 in real world scenarios.

Architecture comparison

The current Tempo architecture includes distributors, ingesters, query frontend, queriers, compactors, object storage, and the metrics-generator.

Refer to Tempo architecture for more information.

The diagram illustrates the current Tempo architecture.

Current Tempo architecture overview

Project Rhythm works as follows:

  • Distributors write incoming requests to Kafka.
  • Ingesters are substituted by live-stores, a read-path component, replaying from Kafka. Live-stores responsibility is to serve trace queries
  • A new component, the block-builder, is introduced to the write path. This component is tasked with replaying from Kafka to build blocks which are sent to long-term storage.

Project Rhythm architecture overview

Bug fixes

For a complete list, refer to the Tempo CHANGELOG.

  • Fixed Tempo configuration options that are always overridden with configuration overrides section. (PR 5202)
  • Correctly apply trace idle period in ingesters and add the concept of trace live period. (PR 5346)
  • Fixed invalid YAML output from /status/runtime_config endpoint by adding document separator. (PR 5371)
  • Fixed panic in query_range HTTP handling that could be triggered by cancellations or other errors. (PR 5667)
  • Fixed cache collision for incomplete query in SearchTagValuesV2 (PR 5549)
  • Deadlock on invalid query to api/v2/search/tags. (SearchTagsV2) (PR 5607)
  • Fixed incorrect root span detection when spans have a child_of link but no parent. (PR 5557)
  • Prevent metrics-generator WAL deletion when tenant is empty. (PR 5586)
  • Fixed docker-compose port configuration for Alloy gRPC (4319 to 4317). (PR 5536)
  • Fixed panic error from empty span id. (PR 5464)
  • Return Bad Request from frontend if the provided tag is invalid in SearchTagValuesV2 endpoint. (PR 5493)