Mimir’s next-gen architecture—Kafka in the middle, object storage underneath, and a whole lot less coupling

Alexa Becker

•

2026-01-02•9 min

Sometimes the most important engineering work starts with a deceptively simple question. Not “What’s the best dashboard layout?” or “How many Ts are in Matt?” (still contested), but something much more fundamental:

What if the read path and the write path didn’t have to share the same fate?

In this episode of “Grafana’s Big Tent,” hosts Mat Ryer, Principal Software Engineer at Grafana Labs, and Tom Wilkie, Grafana Labs CTO, sit down with Marco Pracucci (Principal Software Engineer, Grafana Labs), Cyril Tovena (Principal Software Engineer, Grafana Labs—AI; formerly Loki), and Ryan Worl (Co-founder, WarpStream) to unpack the next-generation architecture for Mimir.

Along the way, the group covers a decade of architectural evolution, what it takes to run metrics at truly enormous scale, why multi-availability-zone costs can quietly become your biggest problem, and how a “Kafka, but on object storage” approach helped Grafana Cloud reduce Mimir’s total cost of ownership.

You can watch the full episode in the YouTube video below, or listen on Spotify or Apple Podcasts.

Note: The following are highlights from episode 3, season 3 of “Grafana’s Big Tent” podcast. This transcript has been edited for length and clarity.

Setting the stage: Mimir’s purpose—and the scale it runs at

Marco Pracucci: Mimir is an open source, time series database. It's scalable, multi-tenant, natively supports OpenTelemetry metrics and Prometheus metrics. Mimir has been designed to store massive volume of metrics in a reliable and cost-effective way. And today, it's what powers the hosted metrics solution at Grafana Cloud, and plenty of users are using our open source version as well.

Tom Wilkie: And just to give some perspective on the scale of Mimir now, it's doing, I think, billions of active series, right, across just Grafana Cloud. And if we talk about all the open source usage, then it's easily doing tens of billions of active series worldwide.

The old architecture: replication factor of 3, quorum, and ingestors as a shared failure point

Tom: So Mimir, the original architecture… this was inspired heavily by Cassandra, right? So it did replication factor three, it did quorum reads and writes… we had this big kind of tier of services that we call "ingestors" that stored everything three times for reliability.

Marco: Replication factor three is expensive. On the ingestion path… we have to keep three copies of each data. And when we query back the most recent data from the ingestors, we have to query at least two copies in order to guarantee consistency.

In what I now call the old architecture, ingestors are also the weak link between the read path and the write path. The typical issue we have is some heavy queries issued by customers that overload the ingestors. And if ingestors are overloaded, the ingestion path, the write path is affected as well. So we may fail to ingest new metrics. In the worst case scenario, we will also have an outage affecting both the write path and the read path at the same time.

Tom: Yeah, the ingestors are a big distributed single point of failure.

Goals: decouple, harden failure behavior, and lower TCO

Marco: So actually, I think we can summarize the three big goals of the new architecture. First, decoupling the read and the write path. I'm used to saying that whatever happens on the read path should not affect the write path and vice versa. Second, we would like to have a better resiliency on node failures. We will dive more into this later, but the core idea of the new architecture is to have a predictable partitioning scheme, so that if one node is unavailable, we know exactly which other node to go to query the same data of the unavailable node. And last, but not least, reduced costs of running the Mimir infrastructure, reducing the Mimir TCO. And one of the biggest cost reductions comes from the reduced replication factor and quorum in the new architecture. We don't have replication factor three and quorum two anymore.

Mat Ryer: Yeah. So when you do these big kind of redesigns, is this something where you literally do it in docs, you sit down and just have to think through basically all of this at the same time, or do you try and kind of break it into bits which you can specialize in? And can you do it iteratively? How does it work?

Marco: My personal approach is to always start with a prototype to validate the idea. The prototype should be a quick hack. I think about prototypes like hackathons, something you should get done in a week. But in a week, you should get something working. Very hacky, but at least to prove the idea.

The architectural shift: Kafka between write and read—and 'gapless consumption' as the unlock

Marco: The main idea of the new architecture is to put Kafka between the write path and the read path. Essentially, a write path request, so a request to ingest metrics, completes as soon as the data has been committed to Kafka. And then this data is asynchronously replayed by ingestors. Ingestors in the new architecture are just read path components, pure read path components that are used to serve the most recent data for queries.

So the core of the idea is to put Kafka between the write path and the read path and leverage the guarantees that the Kafka protocol gives us, for example, full consistency on a given partition. One of the core differences between the old architecture and the new architecture is that ingestors consume from partitions. And when they consume from partition, the consumption is gapless. If you restart an ingestor and the process is down for five minutes, at startup, it will resume consuming from where it was left. There are no gaps in the data, which is very different from the old architecture.

This allowed us to have just quorum one on the read path. We just need to query each partition once. We don't need to query two copies of each data, because there may be gaps in one of these two copies, like in the old architecture. We still have a replication on the ingestors for high availability, but instead of having a replication factor three in the ingestors, we have replication factor two. So we have two copies of each data, and we just query one of these copies. So again, we have replication factor two, just in case one of these ingestors becomes unhealthy or restarts or whatever. We still have another copy of the partition. That's the gist of the new architecture.

Why WarpStream mattered: multi-AZ costs, object storage replication, and the latency trade-off

Marco: The main downside of going multi-AZ is that if you do a bunch of cross-AZ data transfer, your TCO will increase significantly. Both in GCP and AWS, you pay for cross-AZ data transfer, and essentially, you pay two cents per gig to transfer from one AZ to another AZ. That's when we started to look at WarpStream. If you run the new architecture on top of Kafka, you will still pay for cross-AZ data transfer to replicate the data between AZs. The key changer, the game changer for us was WarpStream, adopting WarpStream.

Ryan Worl: The way that WarpStream differs is that instead of replicating over the network directly between nodes, WarpStream writes data directly to object storage first. So we receive a bunch of concurrent writes from different clients, batch those writes together, and then write a file to object storage.

In AWS and GCP, basically everywhere that there is object storage, the guarantees are that the data is replicated across the same number, some number of availability zones that would be equivalent to three. The data is replicated, and you can read it out from any of those other zones and not pay any cross-AZ data transfer costs. So by kind of using object storage as both the storage layer and the network layer, you can bypass the cross-AZ data transfer costs in exchange for accepting the latency penalty that comes with writing the data to object storage first.

What this unlocks for Loki and experimentation

Ryan: How has using WarpStream changed the way you're thinking about new features? And has it opened up any possibilities? I mean, Cyril, you kind of alluded to a little bit, but what are the things that you're, things that were on the to-do list of "what if we had an infinite amount of time to do this new big feature," but now maybe it's a little easier that you have the ability to reconsider the data multiple times?

Cyril Tovena: For the Loki folks, it's definitely around being able to build another storage on the side while the current storage is still working. So that's kind of nice because you can experiment. So that was a change, I think, in the culture of the team, being able to experiment on the side by building another storage or maybe multiple storages and see which one works the best. So yeah, those kind of experiments weren't really possible before.

Results: 25% TCO reduction

Tom: As WarpStream and the new Kafka architecture allowed us to scale Mimir to bigger tenants, has it delivered on the kind of cost savings in a global sense? Has it delivered on the reliability improvements?

Marco: In Grafana Cloud, we reduced the Mimir TCO by 25% moving from the old architecture to the new architecture. And it's fair to say that we started from a position where we squeezed TCO down as much as possible in the old architecture. So we started from a position where we optimized everything that came to our mind, and to further optimize TCO without that, it was required to re-architect part of the system.

What about scalability improvements? I wouldn't say that the new architecture is more scalable than the old one, just because the old one, I consider the old one very scalable as well. I think even with the old architecture, we reached the point where there's nearly no size that scares us too much.

Tom: I'm going to quote you on that, Marco.

“Grafana’s Big Tent” podcast wants to hear from you. If you have a great story to share, want to join the conversation, or have any feedback, please contact the Big Tent team at bigtent@grafana.com.

Mimir’s next-gen architecture—Kafka in the middle, object storage underneath, and a whole lot less coupling

Setting the stage: Mimir’s purpose—and the scale it runs at

The old architecture: replication factor of 3, quorum, and ingestors as a shared failure point

Goals: decouple, harden failure behavior, and lower TCO

The architectural shift: Kafka between write and read—and 'gapless consumption' as the unlock

Why WarpStream mattered: multi-AZ costs, object storage replication, and the latency trade-off

What this unlocks for Loki and experimentation

Results: 25% TCO reduction

Up next

Related content

Related videos

Related docs

Related products

Mimir’s next-gen architecture—Kafka in the middle, object storage underneath, and a whole lot less coupling

Setting the stage: Mimir’s purpose—and the scale it runs at

The old architecture: replication factor of 3, quorum, and ingestors as a shared failure point

Goals: decouple, harden failure behavior, and lower TCO

The architectural shift: Kafka between write and read—and 'gapless consumption' as the unlock

Why WarpStream mattered: multi-AZ costs, object storage replication, and the latency trade-off

What this unlocks for Loki and experimentation

Results: 25% TCO reduction

Related Content

Up next

Related content

Related videos

Related docs

Related products