Like all Norse gods and Marvel villains, every open source project has a good origin story.
The new episode of “Grafana’s Big Tent” podcast provides that behind-the-code look at building Grafana Mimir, the horizontally scalable, highly performant open source time series database that Grafana Labs debuted in March.
In our “Grafana Mimir: Maintainers tell all” episode, Big Tent hosts Tom Wilkie and Mat Ryer welcome Marco Pracucci and Cyril Tovena, two engineers from Grafana Labs who are also maintainers of Mimir, for a deep-dive discussion into the planning and production of our newest open source project.
In this episode, you’ll learn why we launched Mimir, how we scaled to 1 billion active series, all about the new features like the split-and-merge compactor, future features the team is already working on, and why we named the project Mimir (“I was actually personally lobbying for Mjölnir, Thor’s hammer,” says Tom).
Note: This transcript has been edited for length and clarity.
What is Grafana Mimir?
Marco: Mimir is a next-generation time series database for Prometheus and beyond. It’s a time series database we have built at Grafana Labs, which is highly available, horizontally scalable, supports multi-tenancy, has durable storage … and blazing fast query performances over long periods of time.
Tom: You mentioned high scalability. What does that mean? Where does that come from?
Marco: We are seeing an increasing number of customers needing massive scale when it comes to collecting and querying metrics. And we see this growing need across the industry to scale to a number of metrics, which just a couple of years ago were unimaginable. When we are running Grafana Mimir, we are talking about more than a billion active time series for a single tenant, for a single customer.
Tom: A billion.
Marco: I didn’t even know how many zeros are in a billion.
Tom: This is an American billion, right? So it’s nine. It’s not a British billion, which has 12. … Cyril, what is the blazing fast performance bit of Mimir?
Cyril: There are more and more customers nowadays that want to query across multiple clusters, for instance. So this high cardinality of data across a single query exists, and Mimir has been designed to be able to fulfill this need. So you can query across a lot of series, across a lot of clusters, across a lot of namespaces, and you’re going to be able, for instance, get information across your cluster or across multiple clusters.
Tom: I always quote, when people ask me, that the techniques we’ve used have made certain high cardinality queries up to 40 times faster.
Marco: It’s increasing every time we talk about it. [laughs]
Tom: Is it?
Marco: Yeah, we started with 10 then 20, 30, 40. It’s getting bigger.
Tom: This was months ago. In the results I saw, there was one query that was 40 times faster.
Marco: There are some edge cases getting fast. But what we observe typically is about 10x faster with query sharding.
Mat: What’s query sharding? What’s going on there?
Cyril: The idea is that we will parallelize a query. So until now we were actually parallelizing a query just by day or by time. But in the case, as Marco said, where there’s a billion active time series in a cluster, then by time is not enough anymore. So you want to be able to split by data, and we call that shard. A shard is actually a set of series. So what we are going to do is we’re going to actually execute PromQL on a set of selected series. For instance, we’re going to use 16 times, and each of them will only work on 1/16th of the data. And then we’re going to recombine the data back on the front end and you’re going to be able to speed up the query by 16x — or 40x apparently.
“We’re always trying to find ways to release [features] as open source. So one of the reasons we’ve done Mimir is to get the core technology that we’ve built out there and in front of more people.”
Scalability limits: Mimir vs. Cortex vs. Thanos
Marco: We have found different limits [in scalability]. First of all to understand these limitations, we need to understand how the data is stored. Cortex and Thanos and also Mimir use Prometheus TSDB to store the metrics data in long-term storage. So basically Prometheus TSDB partitions data by time. And for a given period of time, the data is stored into a data structure, which is called the block. And again, inside the block, we have an index used to query this data and the actual time series data which are compressed chunks of data. Now we have found some limitations. There are well-known limits in the TSDB index, like the index can’t grow above 64 GB or even inside the index, there are multiple sections and some of the sections can’t be bigger than 4GB.
In the end, it means that you have a limit on how many metrics you can store for a given tenant, or for a given customer. With Mimir, we have introduced a new compactor, which we call the split-and-merge compaction algorithm, which basically allows us to overcome these limitations. Instead of having one single block for a given time period for a customer or a tenant, we shard the data into multiple blocks, and we can shard the data into as many blocks as we want, overcoming the single block limitation or the single block index limitation. Now another issue we have found, which again, it’s well known is that ingesting data, even a very large amount of data is not that complicated, but when it comes to querying back this data fast, things get more tricky.
One major scalability limit we had was that the PromQL is single thread. So if you run a single query, a complex query, a query that is hitting a large amount of metrics, it doesn’t matter how big your machine is. It doesn’t matter how many CPUs you have. The engine will just use one single CPU core. We actually wanted to take full advantage of all the computation power we have or our customers have. And what we have is query sharding, which allows us to run a single query across multiple machines or across multiple CPU cores.
Tom: With the new split-and-merge compactor, what are the limits now? Like how large of a cluster could I build or a tenant could I have?
Marco: There’s no system which scales infinitely. There are always limitations in every system. We are still running Grafana Mimir with up to 1 billion active time series for a single tenant. A single tenant is key in this context, because if you have a billion time series, but across a thousand different tenants, each tenant then is pretty small, right? It’s just a million series per tenant. I’m mentioning a billion time series for a single tenant because that’s the single unit, which is harder to split and to shard in the context of Cortex, Thanos, or even Mimir.
Tom: So we’ve tested up to a billion, and we don’t think it’s infinite. It’s somewhere between a billion and infinite.
Cyril: And we tested it with Grafana k6, too. So we created an extension in k6 for sending Prometheus remote write data. All of this is open source, so anyone can try it at home and see how it scales. You just need a lot of CPU to reach the power of sending a billion of series.
Marco: And memory. [laughs]
Tom: We haven’t mentioned Grafana k6 in this podcast yet. What is it?
Cyril: Grafana k6 is our load testing tool. So there are two products. There’s one product, which is the k6 CLI, where you can define a plan. With the CLI, we automatically load test your application by running your test plan. So it could be a plan that sends metrics, but it could also be load testing your shop website. The other product is the Grafana k6 Cloud, which takes the same plan but it runs in the cloud so you can actually scale your load test.
Mat: k6 uses testify, by the way, the popular go package for testing.
Tom: Oh, and why’d you mention that here, Mat.
Mat: Just because Prometheus also uses it…. Oh, and I created it.
Tom: Oh, there we go.
Mat: Essentially, without me, no one writing go code can be sure their code works. [laughs]
Tom: You reduced every if statement from three lines to one.
Mat: I’ve saved so many lines. Think of how much disk space.
Tom: All those curly braces that you’ve set free!
Why build a new open source project?
Tom: Why did we choose to build Mimir as a separate project? Why didn’t we just contribute these improvements back to Cortex?
Cyril: I think there are two answers. One is that the metrics space is very competitive. And so I think we wanted to build a new project. We were the biggest maintainer and contributors of Cortex and having our own project gives us more agility. It also helps us make sure that other competitors are not taking advantage of our work for free. The other answer is we had a lot of features that we had as close source, and we wanted to make them open source to allow other people to use and experiment with those features.
Marco: It’s fair to say that was a very difficult decision, and it’s all about trade off. It’s about finding a model, which allows us to succeed commercially while at the same time keeping most of our code open. Launching a new first-party project at Grafana Labs with Mimir, we think that fits quite well in the trade-off in this decision.
Tom: A lot of the features we’ve talked about adding to Mimir, a lot of the code we’ve built, I guess it’s worth saying that we had done all of this before for our Grafana Enterprise products and for our Grafana Cloud product. A lot of these ideas were being built in closed source and as a company at Grafana Labs, we really want these things to be open source. And we’re always trying to find ways to release them as open source. So partly one of the reasons we’ve done Mimir is getting the core technology that we’ve built out there and in front of more people.
Marco: When we launched Mimir, a recurring question we got was, as a Cortex user or as a community user was, “Can I jump on Mimir? Can I upgrade? How stable is it?” And the cool thing is that the features we have released in Mimir have been running in production for months at Grafana Labs at any scale — from small clusters to very big clusters, including the 1 billion series cluster we mentioned before. And that’s why we have confidence in the stability of this system.
Cyril: It’s battle-tested.
“We want to keep committing to our big tent philosophy … Right now, Mimir is focused on Prometheus metrics, but we want to support OpenTelemetry or Graphite or Datadog metrics in Mimir.”
— Marco Pracucci
Where did the name Mimir come from?
Tom: We’ve got this LGTM strategy, like logs, graphs, traces, and metrics. Loki, the logging system begins with L, Grafana the graphing system with G, Tempo, the tracing system with T … and Cortex, the metrics system with a C. So it had to begin with an M. And then we went back to our Scandinavian roots. We were looking for a word that began with M that came from mythology and that’s where we came up with Mimir. I was actually personally lobbying for Mjölnir, Thor’s hammer.
Mat: That’s cool. Difficult to spell though.
Tom: So Mimir is a figure in Norse mythology renowned for his knowledge and wisdom who was beheaded during a war. Odin carries around Mimir’s head and it recites secret knowledge and counsel to him.
Mat: And is he going to end up being a Marvel baddie as well?
New features coming to Grafana Mimir
Mat: What are the future plans? Can we talk and give people a bit of a sneak peek?
Marco: We have big plans. So first of all, I think we want to keep committing to our big tent philosophy, and we want Mimir to be a general purpose time series database. Right now, Mimir is focused on Prometheus metrics, but we want to support OpenTelemetry or Graphite or Datadog metrics in Mimir. That’s something we are already working on and will soon be available in Mimir.
Cyril: So I love query performance. That’s why I’m going to talk about two improvements that we want to do in the future. So in Loki LogQL, we are looking into splitting instant range queries. So if you have a range vector with a very large duration, for instance 30 days, that can be very slow. We want to be able to split instant queries. We do that in Loki right now, because the amount of data for 30 days can be tremendous compared to the amount of samples you can have in metrics. We’re going to port [the feature] back into Mimir definitely.
When we worked on query sharding with Marco, we discovered a couple of problems and implementations that we wanted to do. We want to make the TSDB shard aware. So being able to actually request a specific shard from the index or figure out the data for a specific shard from the beginning. Obviously I think the best way to get that into Mimir will be to upstream that into Prometheus so this is something that we can definitely try to do.
Marco: Another area where you may expect some big improvements in the near future is around the Mimir operability. We already mentioned that we simplified a lot of configuration. It’s very easy to get Mimir up and running. But like any distributed system, there’s still some effort required to maintain the system over time — think about version upgrades or think about scaling or fine-tuning the limits for different tenants, stuff like this. So one of my big dreams around the project is to have the Mimir autopilot and trying to reach as close as possible to zero operations in order to run and maintain a Mimir cluster at scale.
Mat: Yes, please!
Marco: This is something which is very easy to say and very difficult to do, but we’ve got some ideas. There are some ongoing conversations at Grafana Labs on how we could significantly improve the operations of scale.
Cyril: That’s super interesting. We currently have the Red Hat team, who actually became members of the Loki project, and they are working on an operator for Loki. I think they have the same goal as you just described, Marco. They want to make it super simple to operate Loki at scale, especially upgrade the cluster or maintain the cluster. Maybe that’s something that could also be reused.
Marco: I think one of the very cool things about working at Grafana Labs is that there’s a lot of cross team [collaboration]. Just to mention one: Cyril built query sharding into Loki before we built it into Mimir. Then he came to me and to other people in the Mimir team with this baggage of knowledge around how to successfully build query sharding. And then we built it into Mimir as well. We learn from each other, and the autopilot will be another great opportunity to learn from other teams.
Cyril: We actually built [query sharding in Mimir] in a better way than in Loki. So now I’m a bit jealous.
Tom: Well, you have to go and backport it back to Loki.
Cyril: Well, that’s what I’m doing right now.
Tom: And then when you figure out how to build it better in Loki than in Mimir, you’ve got to come back and bring your improvements back to Mimir.