Cortex v1.1 released with improved reliability and performance

• 2020-05-21 • 3 min

After five years leading the development of Cortex, Grafana Labs is no longer contributing to this project. In March 2022, we launched Grafana Mimir, an open source long-term storage for Prometheus that lets you scale to 1 billion metrics and beyond. To learn more, please read the TSDB announcement blog and visit the Grafana Mimir page.

30 Mar 2022

Today we’re releasing Cortex 1.1, the first (minor) release since Cortex went GA in March, over 6 weeks ago. This release represents more than 140 commits by over 30 different authors from 9 different companies. In this post we’re going to give you some of the highlights of this release. For more details, check out the changelog.

More features and reliability

With this release, Cortex now supports the Prometheus /api/v1/metadata API. The Grafana Agent will send metric metadata to Cortex, allowing you to access your metric’s metadata (HELP, TYPE, and UNIT) within Grafana’s Explore view.

In the v1.0 release, we added an experimental Write-Ahead-Log (WAL) for samples that haven’t been committed to the chunk store yet. This ensures that, should a machine fail, those samples aren’t lost. After extensive experience testing and running the WAL in production, we’ve marked the feature as production-ready in 1.1.

Even faster queries

This release features a novel optimization for a specific type of query: regular expression selectors with many chained OR cases, e.g., {foo="bar|baz|blip|..."}. These kinds of queries are commonly generated by Grafana dashboards using template variables. Inspired by a similar optimization in Prometheus’s TSDB, we have removed the need to use a regular expression when performing index lookups in the ingester. In certain cases, this can result in up to 100x improvement in query performance:

➜  cortex git:(regex-opt-ingester) ✗ benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                                             old ns/op     new ns/op     delta
BenchmarkSetRegexLookup/select_all-8                  435911065     882075        -99.80%
BenchmarkSetRegexLookup/select_two-8                  247328056     23848         -99.99%
BenchmarkSetRegexLookup/select_half-8                 327012500     530910        -99.84%
BenchmarkSetRegexLookup/select_none-8                 231561666     24398         -99.99%
BenchmarkSetRegexLookup/equality_matcher-8            2474          2488          +0.57%
BenchmarkSetRegexLookup/regex_(non-set)_matcher-8     274129027     276701117     +0.94%

We’ve also made a series of improvements to the Cassandra chunk storage backend, adding TLS host verification and adding the option to limit the concurrency on reads to prevent overwhelming the database.

Finally, we’ve embedded the query frontend component into the single-process Cortex deployment and simplified how to configure it, preventing common misconfigurations. The query frontend is the Cortex component, which implements query sharding, parallelization, and caching and dramatically improves query performance.

Edging close to production-ready block storage

For the past ~6 months we’ve been working with the Thanos team to integrate their block-and-object-storage approach into Cortex. With this release, we’ve added caching of chunk data, improved the memory usage of the ingesters, and introduced blocks sharding via a new store-gateway service.

The store-gateway service sits between queriers and long-term storage and allows the blocks storage to horizontally scale the read path. When the store-gateway is enabled, blocks are sharded and replicated across all store-gateway instances, and then, at query time, the querier fetches relevant series from the minimum set of store-gateway instances holding the required blocks. Before this change, all blocks were loaded on every single querier, thus imposing a vertical scalability limit; with the introduction of the store-gateway, blocks are no longer loaded into queriers, and we can now horizontally scale querying block indexes.

Cortex v1.1 and the future

Since the v1.0 release we’ve seen an uptick in interest in Cortex, matched by an increase in development velocity from the community. I would like to encourage you to try out this release and have a play with Cortex – and if that all looks a bit daunting to you, check out Grafana Cloud, where Cortex is used behind the scenes to power our Prometheus service.

For more on Cortex, you can also watch the on-demand recording of the recent Taking Prometheus to Scale with Cortex webinar I did with Goutham Veeramachaneni.

Cortex v1.1 released with improved reliability and performance

More features and reliability

Even faster queries

Edging close to production-ready block storage

Cortex v1.1 and the future

Related content

Prometheus data source update: Redefining our big tent philosophy

How to import Prometheus-style alerts and recording rules to Grafana-managed alerts and recording...

OpenTelemetry with Prometheus: better integration through resource attribute promotion