Faster, more memory-efficient performance in Grafana Mimir: a closer look at Mimir Query Engine

• 2025-09-17 • 8 min

Until recently, Grafana Mimir — our open source, horizontally scalable, multi-tenant time series database (TSDB) — has exclusively used Prometheus’ PromQL engine to evaluate queries. While the PromQL engine works great, it sometimes needs a lot of memory to run, specifically in the Mimir querier component. To address this memory consumption issue, we recently introduced Mimir Query Engine (MQE).

Rolled out in Grafama Mimir 2.17, MQE is also a foundational feature for the upcoming Mimir 3.0 release, offering faster, more memory-efficient performance for our users.

In this post, we offer a peek under the hood of MQE, how it works compared to the PromQL engine, and how we developed it. You can also learn more about MQE, and other features planned for Grafana Mimir 3.0, in the GrafanaCON 2025 talk below.

Note: While MQE is the default query engine as of Grafana Mimir 2.17, we are not removing support for the Prometheus query engine. You can still use this engine via a command line argument or configuration setting.

Mimir memory consumption and the PromQL engine: an overview

To understand why we developed MQE, it’s helpful to first understand some basics about memory consumption in Mimir and the PromQL engine.

In Mimir, the querier is the component that evaluates queries. At Grafana Labs, we run a number of large Mimir clusters internally. We noticed that the queriers often consume varying amounts of memory because different queries will load a different number of samples. We have to set sufficient memory requests and limits to minimize the probability that a querier pod gets killed by the Kubernetes scheduler when the actual memory utilization exceeds the memory limit.

A graph showing memory usage over time with annotations indicating memory limits, requests, and utilization spikes.

The visualization above shows the graph of querier memory utilization compared with the memory request and memory limit. While setting the memory limit and request on the high side can work, it is inefficient if the querier component is not utilizing the memory requested. Despite plenty of memory utilization spikes here and there, at different times, memory utilization can also go very low below the memory request. Grafana Mimir operators are paying for the memory request at the dashed yellow line, but the actual utilization is often below that.

Now, let’s dig into how Prometheus’ engine evaluates queries to understand the memory utilization behavior described above.

Consider the query sum by(namespace) (http_requests_total{method="GET"). Prometheus’ engine will break this down into an operation for the metric selector http_requests_total{method="GET"}, and an operation for the sum by aggregation operator that consumes the result of the first operation. First, the engine evaluates the first operation, loading samples from the first selector at each time step for every selected series into memory. After that, it applies the sum by operator over this intermediate result, and returns the final result to the user.

Table showing HTTP GET requests by namespace and pod, with totals summed by namespace.

We can see that the peak memory consumption of a query is proportional to the product of the number of series selected and the number of time steps being evaluated. If it needs to load 10 series, it will load them into memory. Likewise, if the first selector matches 1 million series, it will load all of their samples into memory at once. This results in the varying memory utilization in the graph above. In a heavily used time series database, users run many different kinds of queries that can fetch only a few samples or a lot of samples.

Initial improvements to Mimir memory consumption

Over the years, we’ve introduced a number of techniques in Mimir to reduce the peak memory consumption of queriers while evaluating queries. For example:

Time splitting: This approach reduces the number of time steps evaluated by a single querier. Range queries are split into multiple, day-long queries that are evaluated independently by queriers and then stitched back together by query frontends.
Sharding: This reduces the number of series selected by a single querier by splitting the query’s input series into multiple shards, each of which are evaluated independently by queriers and then combined by query-frontends.
Chunks streaming: This reduces the memory consumed by chunks as they are read from ingesters and store-gateways as needed in queriers.

To learn more about time splitting and query sharding, please check out this blog.

One point we haven’t mentioned here is limits, which can be set to block queries that load too many series or have too wide a time range. But limits are the last line of defence. We want to avoid users hitting them as much as possible because they can be frustrating, often non-actionable, and prevent users from running the queries they expect to run.

That said, despite all the techniques above, these improvements still didn’t completely solve the memory utilization issues we were seeing.

MQE: Streaming queries for greater efficiency

We discovered that the best solution to the unpredictable memory consumption in Mimir with Prometheus’ engine is to process queries in a streaming manner without loading all samples at once. This is the core premise of what Mimir Query Engine does.

The key way the engine achieves this is by not loading all the input series into memory at once, and instead loading them into memory only when needed.

Let’s revisit the example query from before: sum by(namespace) (http_requests_total{method="GET"). MQE will execute the same selector http_requests_total{method="GET"), but instead of getting all the results at once, the engine will load the series one by one, while doing the running sum and the grouping using a streaming model.

Diagram showing a table of HTTP GET requests by namespace and pod, with arrows pointing to a summarized table by namespace.

The inputs to a query executed by MQE are series with their samples that match one or more query selectors. These series are passed to zero or more operators. At its peak memory utilization, MQE in this example will only hold samples for one input series and samples for the sole output series at once, a significant reduction compared to Prometheus’ PromQL engine, particularly when the selector selects many series.

Graph comparing data usage of Prometheus engine and MQE over time, with a notable drop at 11:00 labeled 'Toggle the engine.'

The graph above shows the querier memory consumption and how it looks as we toggle the engine from Prometheus to MQE after 11:00, resulting in a significant drop in memory consumption. MQE has proven that it reduces querier memory fluctuations.

MQE is also now used in the Mimir query-frontend, the component that helps to accelerate queries. Using MQE in the query-frontend brings similar improvements as it does to the querier, such as reduced memory consumption and latency. Furthermore, MQE’s common subexpression elimination is also applied in query-frontends, eliminating unnecessary, repeated work. Without going too much into the details, common subexpression elimination is already working for some common use cases and being improved to handle more scenarios with query sharding.

More detailed results

Beyond the obvious reduction in memory consumption, MQE also performs better at scale. For example, evaluating sum by (group) (metric{...}) with 100k input series and 10 distinct values of group sees 92% lower peak memory utilization and runs 39% faster than Prometheus’ PromQL engine: 75 MB instead of 954 MB, and 4.2 seconds instead of 7 seconds.

On top of this, the peak memory utilization of the engine scales much better with increasing numbers of input series. The graph below shows the peak memory consumption when evaluating sum by (group) (metric{...}) with different numbers of input series:

Line graph comparing memory utilization of Prometheus' engine and Mimir Query Engine across series counts, with Prometheus using more memory.

It’s important to note that MQE is 100% compatible with all stable PromQL features, which means we can still run all existing queries, dashboards, and alerts using the Mimir Query Engine while getting the benefit of faster execution and better memory utilization.

Testing correctness

The most important property of a good query engine is the correctness, meaning its ability to produce results that are accurate and complete. Afterall, what’s the point of having a faster and more memory-efficient query engine if it produces the wrong result?

While building MQE, we used the existing query engine — PromQL engine — as the reference for correctness. We knew the result of each query sent to MQE must match the output from Prometheus’ engine.

We tested correctness in two ways. First, we had different unit tests run the same query against two query engines and made sure they had a consistent result. Prometheus provides the PromQL test scripting language, which is a very useful tool for setting up different time series and samples, and helped us test query engine correctness precisely.

Second, we ran a parallel query path in Grafana Cloud’s staging environment using Mimir query-tee to compare the results from real, live queries. This helped us catch possible correctness issues and edge cases that might be missed from the unit tests alone.

Flowchart showing a system architecture with 'query-tee' at the top, connected to two 'query-frontend' boxes, each linked to a 'querier.'

The diagram above shows how the parallel query path works. The same query comes to query-tee which is fanned out to a Prometheus read path and MQE read path. Each query-frontend will forward the request downstream and return each result back to query-tee, which will perform a comparison, alerting us to discrepancies between the two engines.

Learn more

As mentioned above, Mimir Query Engine is available now and is the default engine as of Grafana Mimir 2.17. To learn more, please check out our technical docs and this talk at GrafanaCON 2025.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Faster, more memory-efficient performance in Grafana Mimir: a closer look at Mimir Query Engine

Mimir memory consumption and the PromQL engine: an overview

Initial improvements to Mimir memory consumption

MQE: Streaming queries for greater efficiency

More detailed results

Testing correctness

Learn more

Related content

Grafana Mimir 3.0 release: performance improvements, a new query engine, and more

Prometheus native histograms in Grafana Cloud: Get more precision from your Grafana visualizations

OpenTelemetry with Prometheus: better integration through resource attribute promotion