<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Run Grafana Mimir in production on Grafana Labs</title><link>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/</link><description>Recent content in Run Grafana Mimir in production on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/mimir/v2.11.x/manage/run-production-environment/index.xml" rel="self" type="application/rss+xml"/><item><title>Planning Grafana Mimir capacity</title><link>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/planning-capacity/</link><pubDate>Tue, 17 Mar 2026 09:21:52 +0000</pubDate><guid>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/planning-capacity/</guid><content><![CDATA[&lt;h1 id=&#34;planning-grafana-mimir-capacity&#34;&gt;Planning Grafana Mimir capacity&lt;/h1&gt;
&lt;p&gt;The information that follows is an overview about the CPU, memory, and disk space that Grafana Mimir requires at scale.
You can get a rough idea about the required resources, rather than a prescriptive recommendation about the exact amount of CPU, memory, and disk space.&lt;/p&gt;
&lt;p&gt;The resources utilization is estimated based on a general production workload, and the assumption
is that Grafana Mimir is running with one tenant and the default configuration.
Your real resources’ utilization likely differs, because it is based on actual data, configuration settings, and traffic patterns.
For example, the real resources’ utilization might differ based on the actual number
or length of series&amp;rsquo; labels, or the percentage of queries that reach the store-gateway.&lt;/p&gt;
&lt;p&gt;The resources’ utilization are the minimum requirements.
To gracefully handle traffic peaks, run Grafana Mimir with 50% extra capacity for memory and disk.&lt;/p&gt;
&lt;h2 id=&#34;monolithic-mode&#34;&gt;Monolithic mode&lt;/h2&gt;
&lt;p&gt;When Grafana Mimir is running in monolithic mode, you can estimate the required resources by summing up all of the requirements for each Grafana Mimir component.
For more information about per component requirements, refer to &lt;a href=&#34;#microservices-mode&#34;&gt;Microservices mode&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;microservices-mode&#34;&gt;Microservices mode&lt;/h2&gt;
&lt;p&gt;When Grafana Mimir is running in microservices mode, you can estimate the required resources of each component individually.&lt;/p&gt;
&lt;h3 id=&#34;distributor&#34;&gt;Distributor&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/distributor/&#34;&gt;distributor&lt;/a&gt; component resources utilization is determined by the number of received samples per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core every 25,000 samples per second.&lt;/li&gt;
&lt;li&gt;Memory: 1GB every 25,000 samples per second.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How to estimate the rate of samples per second:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query the number of active series across all of your Prometheus servers:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Check the &lt;a href=&#34;https://prometheus.io/docs/prometheus/latest/configuration/configuration/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;scrape_interval&lt;/a&gt; that you configured in Prometheus.&lt;/li&gt;
&lt;li&gt;Estimate the rate of samples per second by using the following formula:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;estimated rate = (&amp;lt;active series&amp;gt; * (60 / &amp;lt;scrape interval in seconds&amp;gt;)) / 60&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;ingester&#34;&gt;Ingester&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/ingester/&#34;&gt;ingester&lt;/a&gt; component resources’ utilization is determined by the number of series that are in memory.&lt;/p&gt;
&lt;p&gt;Estimated required CPU, memory, and disk space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 300,000 series in memory&lt;/li&gt;
&lt;li&gt;Memory: 2.5GB for every 300,000 series in memory&lt;/li&gt;
&lt;li&gt;Disk space: 5GB for every 300,000 series in memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How to estimate the number of series in memory:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query the number of active series across all your Prometheus servers:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Check the configured &lt;code&gt;-ingester.ring.replication-factor&lt;/code&gt; (defaults to &lt;code&gt;3&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Estimate the total number of series in memory across all ingesters using the following formula:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;total number of in-memory series = &amp;lt;active series&amp;gt; * &amp;lt;replication factor&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;query-frontend&#34;&gt;Query-frontend&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/query-frontend/&#34;&gt;query-frontend&lt;/a&gt; component resources utilization is determined by the number of queries per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 250 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB for every 250 queries per second&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;optional-query-scheduler&#34;&gt;(Optional) Query-scheduler&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/query-scheduler/&#34;&gt;query-scheduler&lt;/a&gt; component resources’ utilization is determined by the number of queries per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 500 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 100MB for every 500 queries per second&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;querier&#34;&gt;Querier&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/querier/&#34;&gt;querier&lt;/a&gt; component resources utilization is determined by the number of queries per second.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core for every 10 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB for every 10 queries per second&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The estimate is 1 CPU core and 1GB per query, with an average query latency of 100ms.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;store-gateway&#34;&gt;Store-gateway&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;store-gateway&lt;/a&gt; component resources’ utilization is determined by the number of queries per second and active series before ingesters replication.&lt;/p&gt;
&lt;p&gt;Estimated required CPU, memory, and disk space:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core every 10 queries per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB every 10 queries per second&lt;/li&gt;
&lt;li&gt;Disk: 13GB every 1 million active series&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The CPU and memory requirements are computed by estimating 1 CPU core and 1GB per query, an average query latency of 1s when reaching the store-gateway, and only 10% of queries reaching the store-gateway.&lt;/p&gt;&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The disk requirement has been estimated assuming 2 bytes per sample for compacted blocks (both index and chunks), the index-header being 0.10% of a block size, a scrape interval of 15 seconds, a retention of 1 year and store-gateways replication factor configured to 3. The resulting estimated store-gateway disk space for one series is 13KB.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;&lt;strong&gt;How to estimate the number of active series before ingesters replication:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Query the number of active series across all your Prometheus servers:

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;optional-ruler&#34;&gt;(Optional) Ruler&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/ruler/&#34;&gt;ruler&lt;/a&gt; component resources utilization is determined by the number of rules evaluated per second.&lt;/p&gt;
&lt;p&gt;When &lt;a href=&#34;../../../references/architecture/components/ruler/#internal&#34;&gt;internal&lt;/a&gt; mode is used (default), rules evaluation is computationally equal to queries execution, so the querier resources recommendations apply to ruler too.&lt;/p&gt;
&lt;p&gt;When &lt;a href=&#34;../../../references/architecture/components/ruler/#internal&#34;&gt;remote&lt;/a&gt; operational mode is used, most of the computational load is shifted to query-frontend and querier components. So those should be scaled accordingly to deal both with queries and rules evaluation workload.&lt;/p&gt;
&lt;h3 id=&#34;compactor&#34;&gt;Compactor&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/compactor/&#34;&gt;compactor&lt;/a&gt; component resources utilization is determined by the number of active series.&lt;/p&gt;
&lt;p&gt;The compactor can scale horizontally both in Grafana Mimir clusters with one tenant and multiple tenants.
We recommend to run at least one compactor instance every 20 million active series ingested in total in the Grafana Mimir cluster, calculated before ingesters replication.&lt;/p&gt;
&lt;p&gt;Assuming you run one compactor instance every 20 million active series, the estimated required CPU, memory and disk for each compactor instance are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 core&lt;/li&gt;
&lt;li&gt;Memory: 4GB&lt;/li&gt;
&lt;li&gt;Disk: 300GB&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information about disk requirements, refer to &lt;a href=&#34;../../../references/architecture/components/compactor/#compactor-disk-utilization&#34;&gt;Compactor disk utilization&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To estimate the number of active series before ingesters replication, query the number of active series across all Prometheus servers:&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(prometheus_tsdb_head_series)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;optional-alertmanager&#34;&gt;(Optional) Alertmanager&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&#34;../../../references/architecture/components/alertmanager/&#34;&gt;Alertmanager&lt;/a&gt; component resources’ utilization is determined by the number of alerts firing at the same time.&lt;/p&gt;
&lt;p&gt;Estimated required CPU and memory:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU: 1 CPU core for every 100 firing alert notifications per second&lt;/li&gt;
&lt;li&gt;Memory: 1GB for every 5,000 firing alerts&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To estimate the peak of firing alert notifications per second in the last 24 hours, run the following query across all Prometheus servers:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(max_over_time(rate(alertmanager_alerts_received_total[5m])[24h:5m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To estimate the maximum number of firing alerts in the last 24 hours, run the following query across all Prometheus servers:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(max_over_time(alertmanager_alerts[24h]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;optional-caches&#34;&gt;(Optional) Caches&lt;/h3&gt;
&lt;p&gt;Grafana Mimir supports caching in various stages of the read path:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;results cache to cache partial query results&lt;/li&gt;
&lt;li&gt;chunks cache to cache timeseries chunks from the object store&lt;/li&gt;
&lt;li&gt;index cache to accelerate looking up series and labels lookups&lt;/li&gt;
&lt;li&gt;metadata cache to accelerate looking up individual timeseries blocks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A rule of thumb for scaling memcached deployments for these caches is to look at the rate of evictions. If it 0 during
steady load and only with occasional spikes, then memcached is sufficiently scaled. If it is &amp;gt;0 all the time, then
memcached needs to be scaled out.&lt;/p&gt;
&lt;p&gt;You can execute the following query to find out the rate of evictions:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum by(instance) (rate(memcached_items_evicted_total{}[5m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="planning-grafana-mimir-capacity">Planning Grafana Mimir capacity&lt;/h1>
&lt;p>The information that follows is an overview about the CPU, memory, and disk space that Grafana Mimir requires at scale.
You can get a rough idea about the required resources, rather than a prescriptive recommendation about the exact amount of CPU, memory, and disk space.&lt;/p></description></item><item><title>Perform a rolling update to Grafana Mimir</title><link>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/perform-a-rolling-update/</link><pubDate>Tue, 17 Mar 2026 09:21:52 +0000</pubDate><guid>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/perform-a-rolling-update/</guid><content><![CDATA[&lt;h1 id=&#34;perform-a-rolling-update-to-grafana-mimir&#34;&gt;Perform a rolling update to Grafana Mimir&lt;/h1&gt;
&lt;p&gt;You can use a rolling update strategy to apply configuration changes to
Grafana Mimir, and to upgrade Grafana Mimir to a newer version. A
rolling update results in no downtime to Grafana Mimir.&lt;/p&gt;
&lt;h2 id=&#34;monolithic-mode&#34;&gt;Monolithic mode&lt;/h2&gt;
&lt;p&gt;When you run Grafana Mimir in monolithic mode, roll out changes to one instance at a time.
After you apply changes to an instance, and the instance restarts, its &lt;code&gt;/ready&lt;/code&gt; endpoint returns HTTP status code &lt;code&gt;200&lt;/code&gt;, which means that you can proceed with rolling out changes to another instance.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: When you run Grafana Mimir on Kubernetes, to roll out changes to one instance at a time, configure the &lt;code&gt;Deployment&lt;/code&gt; or &lt;code&gt;StatefulSet&lt;/code&gt; update strategy to &lt;code&gt;RollingUpdate&lt;/code&gt; and &lt;code&gt;maxUnavailable&lt;/code&gt; to &lt;code&gt;1&lt;/code&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h2 id=&#34;microservices-mode&#34;&gt;Microservices mode&lt;/h2&gt;
&lt;p&gt;When you run Grafana Mimir in microservices mode, roll out changes to multiple instances of each stateless component at the same time.
You can also roll out multiple stateless components in parallel.
Stateful components have the following restrictions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Alertmanagers: Roll out changes to a maximum of two Alertmanagers at a time.&lt;/li&gt;
&lt;li&gt;Ingesters: Roll out changes to one ingester at a time.&lt;/li&gt;
&lt;li&gt;Store-gateways: Roll out changes to a maximum of two store-gateways at a time.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for a component, you can roll out changes to all component instances in the same zone at the same time.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;alertmanagers&#34;&gt;Alertmanagers&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;../../../references/architecture/components/alertmanager/&#34;&gt;Alertmanagers&lt;/a&gt; store alerts state in memory.
When an Alertmanager is restarted, the alerts stored on the Alertmanager are not available until the Alertmanager runs again.&lt;/p&gt;
&lt;p&gt;By default, Alertmanagers replicate each tenant&amp;rsquo;s alerts to three Alertmanagers.
Alerts notification and visualization succeed when each tenant has at least one healthy Alertmanager in their shard.&lt;/p&gt;
&lt;p&gt;To ensure no alerts notification, reception, or visualization fail during a rolling update, roll out changes to a maximum of two Alertmanagers at a time.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for Alertmanager, you can roll out changes to all Alertmanagers in one zone at the same time.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;ingesters&#34;&gt;Ingesters&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;../../../references/architecture/components/ingester/&#34;&gt;Ingesters&lt;/a&gt; store recently received samples in memory.
When an ingester restarts, the samples stored in the restarting ingester are not available for querying until the ingester runs again.&lt;/p&gt;
&lt;p&gt;By default, ingesters run with a replication factor equal to &lt;code&gt;3&lt;/code&gt;.
Ingesters running with the replication factor of &lt;code&gt;3&lt;/code&gt; require a quorum of two instances to successfully query any series samples.
Because series are sharded across all ingesters, Grafana Mimir tolerates up to one unavailable ingester.&lt;/p&gt;
&lt;p&gt;To ensure no query fails during a rolling update, roll out changes to one ingester at a time.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for ingesters, you can roll out changes to all ingesters in one zone at the same time.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;store-gateways&#34;&gt;Store-gateways&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;Store-gateways&lt;/a&gt; shard blocks among running instances.
By default, each block is replicated to three store-gateways.
Queries succeed when each required block is loaded by at least one store-gateway.&lt;/p&gt;
&lt;p&gt;To ensure no query fails during a rolling update, roll out changes to a maximum of two store-gateways at a time.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for store-gateways, you can roll out changes to all store-gateways in one zone at the same time.&lt;/p&gt;&lt;/blockquote&gt;
]]></content><description>&lt;h1 id="perform-a-rolling-update-to-grafana-mimir">Perform a rolling update to Grafana Mimir&lt;/h1>
&lt;p>You can use a rolling update strategy to apply configuration changes to
Grafana Mimir, and to upgrade Grafana Mimir to a newer version. A
rolling update results in no downtime to Grafana Mimir.&lt;/p></description></item><item><title>Scaling out Grafana Mimir</title><link>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/scaling-out/</link><pubDate>Tue, 17 Mar 2026 09:21:52 +0000</pubDate><guid>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/scaling-out/</guid><content><![CDATA[&lt;h1 id=&#34;scaling-out-grafana-mimir&#34;&gt;Scaling out Grafana Mimir&lt;/h1&gt;
&lt;p&gt;Grafana Mimir can horizontally scale every component.
Scaling out Grafana Mimir means that to respond to increased load, you can increase the number of replicas of each Grafana Mimir component.&lt;/p&gt;
&lt;p&gt;We have designed Grafana Mimir to scale up quickly, safely, and with no manual intervention.
However, be careful when scaling down some of the stateful components as these actions can result in writes and reads failures, or partial query results.&lt;/p&gt;
&lt;h2 id=&#34;monolithic-mode&#34;&gt;Monolithic mode&lt;/h2&gt;
&lt;p&gt;When running Grafana Mimir in monolithic mode, you can safely scale up to any number of instances.
To scale down the Grafana Mimir cluster, see &lt;a href=&#34;#scaling-down-ingesters&#34;&gt;Scaling down ingesters&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;read-write-mode&#34;&gt;Read-write mode&lt;/h2&gt;
&lt;p&gt;When running Grafana Mimir in read-write mode, you can safely scale up any of the three components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;You can safely scale the Mimir read component up or down because it is stateless. You could also use an autoscaler.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You can safely scale the Mimir backend component down within one zone at a time. Because it contains the store-gateway, for more information see &lt;a href=&#34;#scaling-down-store-gateways&#34;&gt;Scaling down store-gateways&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To scale down the Mimir write component, see &lt;a href=&#34;#scaling-down-ingesters&#34;&gt;Scaling down ingesters&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;microservices-mode&#34;&gt;Microservices mode&lt;/h2&gt;
&lt;p&gt;When running Grafana Mimir in microservices mode, you can safely scale up any component.
You can also safely scale down any stateless component.&lt;/p&gt;
&lt;p&gt;The following stateful components have limitations when scaling down:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Alertmanagers&lt;/li&gt;
&lt;li&gt;Ingesters&lt;/li&gt;
&lt;li&gt;Store-gateways&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;scaling-down-alertmanagers&#34;&gt;Scaling down Alertmanagers&lt;/h3&gt;
&lt;p&gt;Scaling down &lt;a href=&#34;../../../references/architecture/components/alertmanager/&#34;&gt;Alertmanagers&lt;/a&gt; can result in downtime.&lt;/p&gt;
&lt;p&gt;Consider the following guidelines when you scale down Alertmanagers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scale down no more than two Alertmanagers at the same time.&lt;/li&gt;
&lt;li&gt;Ensure at least &lt;code&gt;-alertmanager.sharding-ring.replication-factor&lt;/code&gt; Alertmanager instances are running (three when running Grafana Mimir with the default configuration).&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt; for Alertmanagers, you can, in parallel, scale down any number of Alertmanager instances within one zone at a time.&lt;/p&gt;&lt;/blockquote&gt;
&lt;h3 id=&#34;scaling-down-ingesters&#34;&gt;Scaling down ingesters&lt;/h3&gt;
&lt;p&gt;&lt;a href=&#34;../../../references/architecture/components/ingester/&#34;&gt;Ingesters&lt;/a&gt; store recently received samples in memory.
When an ingester shuts down, because of a scale down operation, the samples stored in the ingester cannot be discarded in order to guarantee no data loss.&lt;/p&gt;
&lt;p&gt;You might experience the following challenges when you scale down ingesters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;By default, when an ingester shuts down, the samples stored in the ingester are not uploaded to the long-term storage, which causes data loss.&lt;/p&gt;
&lt;p&gt;Ingesters expose an API endpoint &lt;a href=&#34;../../../references/http-api/#shutdown&#34;&gt;&lt;code&gt;/ingester/shutdown&lt;/code&gt;&lt;/a&gt; that flushes in-memory time series data from ingester to the long-term storage and unregisters the ingester from the ring.&lt;/p&gt;
&lt;p&gt;After the &lt;code&gt;/ingester/shutdown&lt;/code&gt; API endpoint successfully returns, the ingester does not receive write or read requests, but the process will not exit.&lt;/p&gt;
&lt;p&gt;You can terminate the process by sending a &lt;code&gt;SIGINT&lt;/code&gt; or &lt;code&gt;SIGTERM&lt;/code&gt; signal after the shutdown endpoint returns.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To mitigate this challenge, ensure that the ingester blocks are uploaded to the long-term storage before shutting down.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When you scale down ingesters, the querier might temporarily return partial results.&lt;/p&gt;
&lt;p&gt;The blocks an ingester uploads to the long-term storage are not immediately available for querying.
It takes the &lt;a href=&#34;../../../references/architecture/components/querier/&#34;&gt;queriers&lt;/a&gt; and &lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;store-gateways&lt;/a&gt; some time before a newly uploaded block is available for querying.
If you scale down two or more ingesters in a short period of time, queries might return partial results.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;scaling-down-ingesters-deployed-in-a-single-zone-default&#34;&gt;Scaling down ingesters deployed in a single zone (default)&lt;/h4&gt;
&lt;p&gt;Complete the following steps to scale down ingesters deployed in a single zone.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Configure the Grafana Mimir cluster to discover and query new uploaded blocks as quickly as possible.&lt;/p&gt;
&lt;p&gt;a. Configure queriers and rulers to always query the long-term storage and to disable ingesters &lt;a href=&#34;../../../configure/configure-shuffle-sharding/&#34;&gt;shuffle sharding&lt;/a&gt; on the read path:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;-querier.query-store-after=0s
-querier.shuffle-sharding-ingesters-enabled=false&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;b. Configure the compactors to frequently update the bucket index:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;-compactor.cleanup-interval=5m&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;c. Configure the store-gateways to frequently refresh the bucket index and to immediately load all blocks:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;-blocks-storage.bucket-store.sync-interval=5m
-blocks-storage.bucket-store.ignore-blocks-within=0s&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;d. Configure queriers, rulers and store-gateways with reduced TTLs for the metadata cache:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;-blocks-storage.bucket-store.metadata-cache.bucket-index-content-ttl=1m
-blocks-storage.bucket-store.metadata-cache.tenants-list-ttl=1m
-blocks-storage.bucket-store.metadata-cache.tenant-blocks-list-ttl=1m
-blocks-storage.bucket-store.metadata-cache.metafile-doesnt-exist-ttl=1m&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scale down one ingester at a time:&lt;/p&gt;
&lt;p&gt;a. Invoke the &lt;code&gt;/ingester/shutdown&lt;/code&gt; API endpoint on the ingester to terminate.&lt;/p&gt;
&lt;p&gt;b. Wait until the API endpoint call has successfully returned and the ingester logged &amp;ldquo;finished flushing and shipping TSDB blocks&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;c. Send a &lt;code&gt;SIGINT&lt;/code&gt; or &lt;code&gt;SIGTERM&lt;/code&gt; signal to the process of the ingester to terminate.&lt;/p&gt;
&lt;p&gt;d. Wait 10 minutes before proceeding with the next ingester. The temporarily applied configuration guarantees newly uploaded blocks are available for querying within 10 minutes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Wait until the originally configured &lt;code&gt;-querier.query-store-after&lt;/code&gt; period of time has elapsed since when all ingesters have been shutdown.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Revert the temporary configuration changes done at the beginning of the scale down procedure.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id=&#34;scaling-down-ingesters-deployed-in-multiple-zones&#34;&gt;Scaling down ingesters deployed in multiple zones&lt;/h4&gt;
&lt;p&gt;Grafana Mimir can tolerate a full-zone outage when you deploy ingesters in &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;multiple zones&lt;/a&gt;.
A scale down of ingesters in one zone can be seen as a partial-zone outage.
To simplify the scale down process, you can leverage ingesters deployed in multiple zones.&lt;/p&gt;
&lt;p&gt;For each zone, complete the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Invoke the &lt;code&gt;/ingester/shutdown&lt;/code&gt; API endpoint on all ingesters that you want to terminate.&lt;/li&gt;
&lt;li&gt;Wait until the API endpoint calls have successfully returned and the ingester has logged &amp;ldquo;finished flushing and shipping TSDB blocks&amp;rdquo;.&lt;/li&gt;
&lt;li&gt;Send a &lt;code&gt;SIGINT&lt;/code&gt; or &lt;code&gt;SIGTERM&lt;/code&gt; signal to the processes of the ingesters that you want to terminate.&lt;/li&gt;
&lt;li&gt;Wait until the blocks uploaded by terminated ingesters are available for querying before proceeding with the next zone.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The required amount of time to wait depends on your configuration and it&amp;rsquo;s the maximum value for the following settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The configured &lt;code&gt;-querier.query-store-after&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Two times the configured &lt;code&gt;-blocks-storage.bucket-store.sync-interval&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Two times the configured &lt;code&gt;-compactor.cleanup-interval&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;scaling-down-store-gateways&#34;&gt;Scaling down store-gateways&lt;/h3&gt;
&lt;p&gt;To guarantee no downtime when scaling down &lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;store-gateways&lt;/a&gt;, complete the following steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Ensure at least &lt;code&gt;-store-gateway.sharding-ring.replication-factor&lt;/code&gt; store-gateway instances are running (three when running Grafana Mimir with the default configuration).&lt;/li&gt;
&lt;li&gt;Scale down no more than two store-gateways at the same time.
If you enabled &lt;a href=&#34;../../../configure/configure-zone-aware-replication/&#34;&gt;zone-aware replication&lt;/a&gt;
for store-gateways, you can in parallel scale down any number of store-gateway instances in one zone at a time.
Zone-aware replication is enabled by default in the &lt;code&gt;mimir-distributed&lt;/code&gt; Helm chart.&lt;/li&gt;
&lt;li&gt;Stop the store-gateway instances you want to scale down.&lt;/li&gt;
&lt;li&gt;If you have set the value of &lt;code&gt;-store-gateway.sharding-ring.unregister-on-shutdown&lt;/code&gt; to &lt;code&gt;false&lt;/code&gt;, then remove the stopped instances from the store-gateway ring:
&lt;ol&gt;
&lt;li&gt;In a browser, go to the &lt;code&gt;GET /store-gateway/ring&lt;/code&gt; page that store-gateways expose on their HTTP port.&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Forget&lt;/strong&gt; on the instances that you scaled down.
Alternatively, wait for the duration of the value of &lt;code&gt;-store-gateway.sharding-ring.heartbeat-timeout&lt;/code&gt; times 10.
The default value of &lt;code&gt;-store-gateway.sharding-ring.heartbeat-timeout&lt;/code&gt; is one minute.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Proceed with the next two store-gateway replicas. If you are using zone-aware replication, the proceed with the next zone.&lt;/li&gt;
&lt;/ol&gt;
]]></content><description>&lt;h1 id="scaling-out-grafana-mimir">Scaling out Grafana Mimir&lt;/h1>
&lt;p>Grafana Mimir can horizontally scale every component.
Scaling out Grafana Mimir means that to respond to increased load, you can increase the number of replicas of each Grafana Mimir component.&lt;/p></description></item><item><title>Grafana Mimir production tips</title><link>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/production-tips/</link><pubDate>Tue, 17 Mar 2026 09:21:52 +0000</pubDate><guid>https://grafana.com/docs/mimir/v2.11.x/manage/run-production-environment/production-tips/</guid><content><![CDATA[&lt;h1 id=&#34;grafana-mimir-production-tips&#34;&gt;Grafana Mimir production tips&lt;/h1&gt;
&lt;p&gt;This topic provides tips and techniques for you to consider when setting up a production Grafana Mimir cluster.&lt;/p&gt;
&lt;h2 id=&#34;ingester&#34;&gt;Ingester&lt;/h2&gt;
&lt;h3 id=&#34;ensure-a-high-maximum-number-of-open-file-descriptors&#34;&gt;Ensure a high maximum number of open file descriptors&lt;/h3&gt;
&lt;p&gt;The ingester receives samples from distributor, and appends the received samples to the specific per-tenant TSDB that is stored on the ingester local disk.
The per-tenant TSDB is composed of several files and the ingester keeps a file descriptor open for each TSDB file.
The total number of file descriptors, used to load TSDB files, linearly increases with the number of tenants in the Grafana Mimir cluster and the configured &lt;code&gt;-blocks-storage.tsdb.retention-period&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We recommend fine-tuning the following settings to avoid reaching the maximum number of open file descriptors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure the system&amp;rsquo;s &lt;code&gt;file-max&lt;/code&gt; ulimit to at least &lt;code&gt;65536&lt;/code&gt;. Increase the limit to &lt;code&gt;1048576&lt;/code&gt; when running a Grafana Mimir cluster with more than a thousand tenants.&lt;/li&gt;
&lt;li&gt;Enable ingesters &lt;a href=&#34;../../../configure/configure-shuffle-sharding/&#34;&gt;shuffle sharding&lt;/a&gt; to reduce the number of tenants per ingester.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;ingester-disk-space-requirements&#34;&gt;Ingester disk space requirements&lt;/h3&gt;
&lt;p&gt;The ingester writes received samples to a write-ahead log (WAL) and by default, compacts them into a new block every two hours.
Both the WAL and blocks are temporarily stored on the local disk.
The required disk space depends on the number of time series stored in the ingester and the configured &lt;code&gt;-blocks-storage.tsdb.retention-period&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For more information about estimating ingester disk space requirements, refer to &lt;a href=&#34;../planning-capacity/#ingester&#34;&gt;Planning capacity&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;ingester-disk-iops&#34;&gt;Ingester disk IOPS&lt;/h3&gt;
&lt;p&gt;The IOPS (input/output operations per second) and latency of the ingester disks can affect both write and read requests.
On the write path, the ingester writes to the write-ahead log (WAL) on disk.
On the read path, the ingester reads from the series whose chunks have already been written to disk.&lt;/p&gt;
&lt;p&gt;For these reasons, run the ingesters on disks such as SSDs that have fast disk speed.&lt;/p&gt;
&lt;h2 id=&#34;querier&#34;&gt;Querier&lt;/h2&gt;
&lt;h3 id=&#34;ensure-caching-is-enabled&#34;&gt;Ensure caching is enabled&lt;/h3&gt;
&lt;p&gt;The querier supports caching to reduce the number API requests to the long-term storage.&lt;/p&gt;
&lt;p&gt;We recommend enabling caching in the querier.
For more information about configuring the cache, refer to &lt;a href=&#34;../../../references/architecture/components/querier/&#34;&gt;querier&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;avoid-querying-non-compacted-blocks&#34;&gt;Avoid querying non-compacted blocks&lt;/h3&gt;
&lt;p&gt;When running Grafana Mimir at scale, querying non-compacted blocks might be inefficient for the following reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Non-compacted blocks contain duplicated samples, as a result of the ingesters replication.&lt;/li&gt;
&lt;li&gt;Querying many small TSDB indexes is slower than querying a few compacted TSDB indexes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The default values for &lt;code&gt;-querier.query-store-after&lt;/code&gt;, &lt;code&gt;-querier.query-ingesters-within&lt;/code&gt;, and &lt;code&gt;-blocks-storage.bucket-store.ignore-blocks-within&lt;/code&gt; are set such that only compacted blocks are queried. In most cases, no additional configuration is required.&lt;/p&gt;
&lt;p&gt;Configure Grafana Mimir so large tenants are parallelized by the compactor:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure compactor&amp;rsquo;s &lt;code&gt;-compactor.split-and-merge-shards&lt;/code&gt; and &lt;code&gt;-compactor.split-groups&lt;/code&gt; for every tenant with more than 20 million active time series. For more information about configuring the compactor&amp;rsquo;s split and merge shards, refer to &lt;a href=&#34;../../../references/architecture/components/compactor/&#34;&gt;compactor&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id=&#34;how-to-estimate--querierquery-store-after&#34;&gt;How to estimate &lt;code&gt;-querier.query-store-after&lt;/code&gt;&lt;/h4&gt;
&lt;p&gt;If you are not using the defaults, set the &lt;code&gt;-querier.query-store-after&lt;/code&gt; to a duration that is large enough to give compactor enough time to compact newly uploaded blocks, and queriers and store-gateways to discover and synchronize newly compacted blocks.&lt;/p&gt;
&lt;p&gt;The following diagram shows all of the timings involved in the estimation. This diagram should be used only as a template and you can modify the assumptions based on real measurements in your Mimir cluster. The example makes the following assumptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;An ingester takes up to 30 minutes to upload a block to the storage&lt;/li&gt;
&lt;li&gt;The compactor takes up to three hours to compact two-hour blocks shipped from all ingesters&lt;/li&gt;
&lt;li&gt;Querier and store-gateways take up to 15 minutes to discover and load a new compacted block&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on these assumptions, in the worst-case scenario, it takes up to six hours and 45 minutes from when a sample is ingested until that sample has been appended to a block flushed to the storage and the block is &lt;a href=&#34;../../../references/architecture/components/compactor/&#34;&gt;vertically compacted&lt;/a&gt; with all other overlapping two-hour blocks shipped from ingesters.&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;avoid-querying-non-compacted-blocks.png&#34;
  alt=&#34;Avoid querying non compacted blocks&#34;/&gt;&lt;/p&gt;
&lt;h2 id=&#34;store-gateway&#34;&gt;Store-gateway&lt;/h2&gt;
&lt;h3 id=&#34;ensure-caching-is-enabled-1&#34;&gt;Ensure caching is enabled&lt;/h3&gt;
&lt;p&gt;The store-gateway supports caching that reduces the number of API calls to the long-term storage and improves query performance.&lt;/p&gt;
&lt;p&gt;We recommend enabling caching in the store-gateway.
For more information about configuring the cache, refer to &lt;a href=&#34;../../../references/architecture/components/store-gateway/&#34;&gt;store-gateway&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;ensure-a-high-number-of-maximum-open-file-descriptors&#34;&gt;Ensure a high number of maximum open file descriptors&lt;/h3&gt;
&lt;p&gt;The store-gateway stores each block’s index-header on the local disk and loads it via memory mapping.
The store-gateway keeps a file descriptor open for each index-header loaded at a given time.
The total number of file descriptors used to load index-headers linearly increases with the number of blocks owned by the store-gateway instance.&lt;/p&gt;
&lt;p&gt;We recommend configuring the system&amp;rsquo;s &lt;code&gt;file-max&lt;/code&gt; ulimit at least to &lt;code&gt;65536&lt;/code&gt; to avoid reaching the maximum number of open file descriptors.&lt;/p&gt;
&lt;h3 id=&#34;store-gateway-disk-iops&#34;&gt;Store-gateway disk IOPS&lt;/h3&gt;
&lt;p&gt;The IOPS and latency of the store-gateway disk can affect queries.
The store-gateway downloads the block’s &lt;a href=&#34;../../../references/architecture/binary-index-header/&#34;&gt;index-headers&lt;/a&gt; onto local disk, and reads them for each query that needs to fetch data from the long-term storage.&lt;/p&gt;
&lt;p&gt;For these reasons, run the store-gateways on disks such as SSDs that have fast disk speed.&lt;/p&gt;
&lt;h2 id=&#34;compactor&#34;&gt;Compactor&lt;/h2&gt;
&lt;h3 id=&#34;ensure-the-compactor-has-enough-disk-space&#34;&gt;Ensure the compactor has enough disk space&lt;/h3&gt;
&lt;p&gt;The compactor requires a lot of disk space to download source blocks from the long-term storage and temporarily store the compacted block before uploading it to the storage.
For more information about required disk space, refer to &lt;a href=&#34;../../../references/architecture/components/compactor/#compactor-disk-utilization&#34;&gt;Compactor disk utilization&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;caching&#34;&gt;Caching&lt;/h2&gt;
&lt;h3 id=&#34;ensure-memcached-is-properly-scaled&#34;&gt;Ensure Memcached is properly scaled&lt;/h3&gt;
&lt;p&gt;We recommend ensuring Memcached evictions happen infrequently.
Grafana Mimir query performance might be negatively affected if your Memcached cluster evicts items frequently.
We recommend increasing your Memcached cluster replicas to add more memory to the cluster and reduce evictions.&lt;/p&gt;
&lt;p&gt;We also recommend running a dedicated Memcached cluster for each type of cache: query results, metadata, index, and chunks.
Running a dedicated Memcached cluster for each cache type is not required, but recommended so that each cache is isolated from the others.&lt;/p&gt;
&lt;h2 id=&#34;security&#34;&gt;Security&lt;/h2&gt;
&lt;p&gt;We recommend securing the Grafana Mimir cluster.
For more information about securing a Mimir cluster, refer to &lt;a href=&#34;../../secure/&#34;&gt;Secure Grafana Mimir&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;network&#34;&gt;Network&lt;/h2&gt;
&lt;p&gt;Most of the communication between Mimir components occurs over gRPC. The gRPC
connection does not use any compression by default.&lt;/p&gt;
&lt;p&gt;If network throughput is a concern or a high cost, then you can enable compression on the gRPC connection between
components. This will reduce the network throughput at the cost of increased CPU usage. You can choose between gzip and
snappy. Gzip provides better compression than snappy at the cost of more CPU usage.&lt;/p&gt;
&lt;p&gt;You can use the &lt;a href=&#34;http://quixdb.github.io/squash-benchmark/#results-table&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Squash Compression Benchmark&lt;/a&gt; to choose between snappy and gzip.
For protobuf data snappy achieves a compression ratio of 5 with compression speeds of
around 400MiB/s. For the same data gzip achieves a ratio between 6 and 8 with speeds between 50MiB/s and 135 MiB/s.&lt;/p&gt;
&lt;p&gt;To configure gRPC compression, use the following CLI flags or their YAML equivalents. The accepted values are
&lt;code&gt;snappy&lt;/code&gt; and &lt;code&gt;gzip&lt;/code&gt;. If you set the flag to an empty string (&lt;code&gt;&#39;&#39;&lt;/code&gt;), it explicitly disables compression.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;CLI flag&lt;/th&gt;
              &lt;th&gt;YAML option&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-query-frontend.grpc-client-config.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;alertmanager.alertmanager_client.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-query-scheduler.grpc-client-config.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;frontend.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-ruler.client.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;frontend_worker.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-ruler.query-frontend.grpc-client-config.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;ingester_client.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-alertmanager.alertmanager-client.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;query_scheduler.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;-ingester.client.grpc-compression&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;ruler.query_frontend.grpc_client_config.grpc_compression&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;]]></content><description>&lt;h1 id="grafana-mimir-production-tips">Grafana Mimir production tips&lt;/h1>
&lt;p>This topic provides tips and techniques for you to consider when setting up a production Grafana Mimir cluster.&lt;/p>
&lt;h2 id="ingester">Ingester&lt;/h2>
&lt;h3 id="ensure-a-high-maximum-number-of-open-file-descriptors">Ensure a high maximum number of open file descriptors&lt;/h3>
&lt;p>The ingester receives samples from distributor, and appends the received samples to the specific per-tenant TSDB that is stored on the ingester local disk.
The per-tenant TSDB is composed of several files and the ingester keeps a file descriptor open for each TSDB file.
The total number of file descriptors, used to load TSDB files, linearly increases with the number of tenants in the Grafana Mimir cluster and the configured &lt;code>-blocks-storage.tsdb.retention-period&lt;/code>.&lt;/p></description></item></channel></rss>