Grafana Tempo 1.1 released: New hedged requests reduce latency by 45%
Grafana Tempo 1.1 has been released, and like our major version suggests, there are no breaking changes. If you’d like, please check out the release notes. But if you find that release notes can sometimes be difficult to decode, fret not! All the highlights are below.
There are a number of features and enhancements detailed in the docs, but here are a few you should be paying attention to if you are running a high-volume Tempo cluster.
First, hedged requests allow you to retry a request to your backend a second time when it exceeds a configurable threshold. If this threshold is set to your backend’s p99, then Tempo will retry the slowest 1% of all backend requests. This can have an amazing impact on Tempo’s overall long tail:
As this graph shows, p99 went from ~4.5s to 2.5s, a 45% improvement in performance.
Next, the tenant index can both improve performance as well as reduce Tempo TCO. Previously all queriers and compactors regularly polled the backend in order to maintain an up-to-date list of the backend blocks. Now only a small handful of compactors are responsible for this, while the rest of the components simply pull the pre-built index.
GCS GETs per second
Queries to the backend were cut by about 60%, a substantial cost savings.
Finally, caching parameters have been added to give you more control over which elements of the block — and which blocks — to cache. Previously all bloom filters were cached, and for larger installs this could require an intimidating amount of cache space. We intend to add more functionality here as we go forward to help operators more carefully control the amount of cache space they need.
You may be shocked to learn that Tempo does have bugs, and with 1.1 we are happy to have at least two fewer!
First, queriers returned 404s immediately after startup. This was because a querier would connect to the query-frontend before it had finished a complete polling cycle. This has been fixed by requiring a polling cycle to complete before finishing startup.
Also, unhealthy ring members sometimes got stuck if you were using memberlist. This issue has been with us for some time, and it is so nice to finally fix it. Additionally, memberlist has been improved and new defaults have been added to improve ring reliability and reduce CPU and memory requirements. Thanks to Peter Štibraný for fixing these issues in Cortex!
It’s time to say goodbye to some old block formats. In 1.1 we are deprecating v0 and v1 blocks, and support will be completely removed in 1.2. If you are on Tempo 0.7.0 or later, then you are already using v2 blocks and you have nothing to worry about.
However, if you are on 0.6.0 or older, please refer to the release notes on how to move forward. Hint: If you just upgrade to 1.1 now, then you will be fine when 1.2 is released.
The Tempo release cycle has been roughly once every two months. So far this has fit the project well, and I think it’s a good cadence to continue. I can’t say I know everything that will be in Tempo 1.2, but the first phase of native search (recent traces) will be merged, so get excited for that!
If you are interested in more Tempo news or search progress, please join us on the public Grafana Slack channel #tempo, post a question in the forums, reach out on Twitter, or join our monthly community call. See you there!