This is documentation for the next version of Grafana Mimir documentation. For the latest stable release, go to the latest version.

Manage

Run in production

Scaling out

Open source

Scaling out Grafana Mimir

Grafana Mimir can horizontally scale every component. Scaling out Grafana Mimir means that to respond to increased load, you can increase the number of replicas of each Grafana Mimir component.

We have designed Grafana Mimir to scale up quickly, safely, and with no manual intervention. However, be careful when scaling down some of the stateful components as these actions can result in writes and reads failures, or partial query results.

Monolithic mode

When running Grafana Mimir in monolithic mode, you can safely scale up to any number of instances. To scale down the Grafana Mimir cluster, see Scaling down ingesters.

Microservices mode

When running Grafana Mimir in microservices mode, you can safely scale up any component. You can also safely scale down any stateless component.

The following stateful components have limitations when scaling down:

Alertmanagers
Ingesters
Store-gateways

Scaling down Alertmanagers

Scaling down Alertmanagers can result in downtime.

Consider the following guidelines when you scale down Alertmanagers:

Scale down no more than two Alertmanagers at the same time.
Ensure at least -alertmanager.sharding-ring.replication-factor Alertmanager instances are running (three when running Grafana Mimir with the default configuration).

Note
If you enabled zone-aware replication for Alertmanagers, you can, in parallel, scale down any number of Alertmanager instances within one zone at a time.

Scaling down ingesters in ingest storage architecture

Note
This guidance applies to ingest storage architecture. For more information about the supported architectures in Grafana Mimir, refer to Grafana Mimir architecture.

When running Grafana Mimir with ingest storage architecture, scaling down ingesters triggers the reassignment of ingestion partitions instead of transferring in-memory series ownership between ingesters.

The ingestion layer durably stores each partition and can reassign it to a new ingester without data loss. When you terminate or scale down an ingester, it stops writing to its assigned partitions. Other ingesters continue consuming active partitions as normal according to the partition lifecycle.

For details about how partitions are created, reassigned, and transitioned between states, refer to Grafana Mimir hash rings.

Because the system writes ingestion data to Kafka and persists it in object storage, scaling down ingesters in the ingest storage architecture doesn’t require draining in-memory series or handoff operations.

In production environments, scaling down typically happens automatically through the rollout-operator, which coordinates ingesters across zones. The rollout-operator prepares ingesters for shutdown by moving their partitions from ACTIVE to INACTIVE and removes the ingesters after a defined period.

The rollout-operator is deployed by the Grafana Mimir Helm chart and is the recommended way to manage partitioned ingesters and scaling operations for ingest storage.

If you’re managing ingesters manually, you can use GET, POST, or DELETE on the HTTP API endpoint /ingester/prepare-partition-downscale to prepare ingesters for downscaling instead of relying on the rollout-operator.

Scaling down ingesters in classic architecture

Note
This guidance applies to classic architecture. For more information about the supported architectures in Grafana Mimir, refer to Grafana Mimir architecture.

Ingesters store recently received samples in memory. When you scale down an ingester, do not discard the samples stored in the ingester to guarantee no data loss.

You might experience the following challenges when you scale down ingesters:

By default, when an ingester shuts down, it does not upload the samples to long-term storage, which causes data loss.
Ingesters expose an API endpoint /ingester/shutdown that flushes in-memory time series data from ingester to the long-term storage and unregisters the ingester from the ring.
After the /ingester/shutdown API endpoint successfully returns, the ingester doesn’t receive write or read requests, but the process doesn’t exit.
You can terminate the process by sending a SIGINT or SIGTERM signal after the shutdown endpoint returns.
To mitigate this challenge, upload the ingester blocks to long-term storage before shutting down.
When you scale down ingesters, the querier might temporarily return partial results.
The blocks an ingester uploads to the long-term storage are not immediately available for querying. It takes the queriers and store-gateways some time before a newly uploaded block is available for querying. If you scale down two or more ingesters in a short period of time, queries might return partial results.

Complete the following steps to scale down ingesters in any zone.

Send a POST request to the /ingester/prepare-instance-ring-downscale API endpoint on each ingester to place it into read-only mode.
Wait until the blocks uploaded by read-only ingesters are available for querying before proceeding. The required amount of time to wait depends on your configuration and is the maximum value for the following settings:
- The configured -querier.query-store-after setting
- Two times the configured -blocks-storage.bucket-store.sync-interval setting
- Two times the configured -compactor.cleanup-interval setting
Scale down each ingester:
1. Send a POST request to the /ingester/shutdown API endpoint on the ingester to terminate it.
2. Wait until the API endpoint call has successfully returned and the ingester has logged “finished flushing and shipping TSDB blocks”.
3. Send a SIGINT or SIGTERM signal to the process of the ingester to terminate.

Scaling down store-gateways

To guarantee no downtime when scaling down store-gateways, complete the following steps:

Ensure at least -store-gateway.sharding-ring.replication-factor store-gateway instances are running (three when running Grafana Mimir with the default configuration).
Scale down no more than two store-gateways at the same time. If you enabled zone-aware replication for store-gateways, you can in parallel scale down any number of store-gateway instances in one zone at a time. Zone-aware replication is enabled by default in the mimir-distributed Helm chart.
Stop the store-gateway instances you want to scale down.
If you have set the value of -store-gateway.sharding-ring.unregister-on-shutdown to false, then remove the stopped instances from the store-gateway ring:
1. In a browser, go to the GET /store-gateway/ring page that store-gateways expose on their HTTP port.
2. Click Forget on the instances that you scaled down. Alternatively, wait for the duration of the value of -store-gateway.sharding-ring.heartbeat-timeout times 10. The default value of -store-gateway.sharding-ring.heartbeat-timeout is one minute.
Proceed with the next two store-gateway replicas. If you are using zone-aware replication, the proceed with the next zone.

Was this page helpful?

Suggest an edit in GitHub

Create a GitHub issue

Email docs@grafana.com

Help and support

Community

Scaling out Grafana Mimir

Monolithic mode

Microservices mode

Scaling down Alertmanagers

Scaling down ingesters in ingest storage architecture

Scaling down ingesters in classic architecture

Scaling down store-gateways

Was this page helpful?

Related resources from Grafana Labs