Grafana Mimir version 2.1 release notes
Grafana Labs is excited to announce version 2.1 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
Below we highlight the top features, enhancements and bugfixes in this release, as well as relevant callouts for those upgrading from Grafana Mimir 2.0. The complete list of changes is recorded in the Changelog.
Features and enhancements
Mimir on ARM: We now publish Docker images for both
arm64, making it easier for those on arm-based machines to develop and run Mimir. Multiplaform images are available from the the Mimir docker registry. Note that our existing integration test suite only uses the
amd64images, which means we cannot make any functional or performance guarantees about the
Remoteruler mode for improved rule evaluation performance: We’ve added a
remotemode for the Grafana Mimir ruler, in which the ruler delegates rule evaluation to the query-frontend rather than evaluating rules directly within the ruler process itself. This allows recording and alerting rules to benefit from the query parallelization techniques implemented in the query-frontend (like query sharding).
Remotemode is considered experimental and is off by default. To enable, see remote ruler.
Per-tenant custom trackers for monitoring cardinality: In Grafana Mimir 2.0, we introduced a custom tracker feature that allows you to track the count of active series over time that match a specific label matcher. In Grafana Mimir 2.1, we’ve made it possible to configure custom trackers via the runtime configuration file. This means you can now define different trackers for each tenant in your cluster and modify those trackers without an ingester restart.
Reduce cardinality of Grafana Mimir’s
/metricsendpoint: While Grafana Mimir does a good job of exposing a relatively small number of series about its own state, this number can tick up when running Grafana Mimir clusters with high tenant counts or high active series counts. To reduce this number (and the accompanying cost of scraping and storing these time series), we made several optimizations which decreased series count on the
/metricsendpoint by more than 10%.
We’ve updated the default values for 2 parameters in Grafana Mimir to give users better out-of-the-box performance:
We’ve changed the default for
false. We’ve marked this flag as deprecated and will remove it completely in 2 releases. TSDB isolation is a feature inherited from Prometheus that didn’t provide any benefit given Grafana Mimir’s distributed architecture and in our 1 billion series load test we found it actually hurt performance. Disabling it reduced our ingester 99th percentile latency by 90%.
The store-gateway attributes cache is now enabled by default (achieved by updating the default for
50000). This in-memory cache makes it faster to look up object attributes for chunk data. We’ve been running this optional cache internally for a while and upon a recent configuration audit, realized it made sense to do the same for all users. The increase in store-gateway memory utilization from enabling this cache is negligible and easily justified given the performance gains.
As part of the enhancement to make active series custom trackers configurable on a per-tenant basis, we’ve moved where they get configured. Users of this feature should migrate from defining
ingestersection of Grafana Mimir’s YAML configuration to defining it in the
limitssection. Both options are supported for now to give users time to make the change, but configuring
ingestersection has been marked deprecated and will be removed in Mimir 2.3. Note: Configuring
active_series_custom_trackers_configin both the
ingestersections will cause Grafana Mimir to fail on startup.
2.1.0 bug fixes
- PR 1704: Fixed a bug that previously caused Grafana Mimir to crash on startup when trying to run in monolithic mode with the results cache enabled due to duplicate metric names.
- PR 1835: Fixed a bug that caused Grafana Mimir to crash when an invalid Alertmanager configuration was set even though the Alertmanager component was disabled. After this fix, the Alertmanager configuration is only validated if the Alertmanager component is loaded.
- PR 1836: The ability to run Alertmanager with
localstorage broke in Grafana Mimir 2.0 when we removed the ability to run the Alertmanager without sharding. With this bugfix, we’ve made it possible to again run Alertmanager with
localstorage. However, for production use, we still recommend using external store since this is needed to persist Alertmanager state (e.g. silences) between replicas.
- PR 1715: Restored Grafana Mimir’s ability to use CNAME DNS records to reach memcached servers. The bug was inherited from an upstream change to Thanos; we contributed a fix to Thanos and subsequently updated our Thanos version.