Grafana Mimir version 2.0 release notes
Grafana Labs is excited to announce the first release of Grafana Mimir, the most scalable, most performant open source time series database in the world. In customer tests, we’ve shown that a single cluster can support more than one billion active time series.
Besides massive scale, Grafana Mimir offers a host of other benefits, including easy deployment, native multi-tenancy, high availability, durable long-term storage, and exceptional query performance on even the highest-cardinality queries.
For more information about Grafana Mimir features and capabilities, watch the Introducing Grafana Mimir video.
We’re launching Grafana Mimir with a 2.0 version number to signal our respect for Cortex, the project from which Grafana Mimir was forked. The choice of 2.0 also represents our conviction that Grafana Mimir is real-world-tested, production-ready software. It has served as the backbone of our Grafana Cloud Metrics and Grafana Enterprise Metrics products since their inception.
For more on the story of why we created Grafana Mimir, please see the following blog posts:
- Announcing Grafana Mimir, the most scalable open source TSDB in the world
- Grafana Mimir Q&A with Grafana Labs CEO Raj Dutt
Grafana Mimir builds on Cortex 1.10.0, adding features we developed to run Grafana Enterprise Metrics and Grafana Cloud Metrics at massive scale. As a result, these listed features and enhancements, migration considerations, and bug fixes highlight changes between Cortex 1.10.0 and Grafana Mimir 2.0.
The complete list of changes is recorded in the Changelog.
Features and enhancements
These features and enhancements distinguish Grafana Mimir from Cortex 1.10.0:
Simplified deployment experience: These changes improve the experience of setting up and maintaining Grafana Mimir:
We revamped Cortex’s single process mode, renaming it monolithic mode. Monolithic mode now runs the required compactor component, and has better configuration defaults, so that it works with fewer changes out-of-the-box.
We switched to
memberlistas the default store for Grafana Mimir’s hash rings. With this, users no longer have to run Consul or etcd as external dependencies. We made performance optimizations to
memberlistto reduce its CPU utilization, which ensures that
memberlistruns smoothly on Grafana Mimir clusters with lots of active series.
We’ve included our own internal best practice dashboards, mixins, and alerts for monitoring Grafana Mimir. While installing monitoring best practices such as these has typically required use of Jsonnet, we’ve eliminated this requirement. We include dashboards as JSON and alerting and recording rules as YAML which can be directly imported into your Grafana and Prometheus deployments. The alerts are accompanied by runbooks distilled from our own internal operations.
Configuration parameter reduction and classification: We removed 36% of the configuration parameters in Grafana Mimir. All remaining configuration parameters have been classified as basic, advanced, or experimental. This is meant to make Grafana Mimir’s configuration more approachable for new users. In a default installation, you can focus exclusively on basic configuration. As you become more familiar with Grafana Mimir and want to push your clusters further, you can choose to tune advanced parameters or use experimental parameters. Refer to parameter categories to learn more.
More-scalable metrics compaction: Grafana Mimir’s new compactor uses a split-and-merge compaction algorithm. This new compactor parallelizes the compaction of overlapping blocks across multiple machines, and splits blocks in such a way as to overcome Prometheus’ 64 GB index size limit. This solves a problem that we’ve seen with tenants with greater than 30M time series in Cortex - namely, that the count of uncompacted blocks grew indefinitely because the compactor couldn’t work fast enough. With its new compactor, a single Grafana Mimir cluster can support over one billion active time series. Refer to compactor to learn more.
Query sharding for improved query speeds: Grafana Mimir introduces query sharding to accelerate the execution of high-cardinality or CPU-intensive queries. Query sharding distributes the execution of a single query across multiple machines, to significantly reduce query execution time. We have seen speedups of 10 to 30x in our Grafana Cloud Metrics clusters. Refer to query sharding to learn more.
Federated rule groups: Grafana Mimir makes it possible to write alerting and recording rules that use metrics data from multiple tenants. For example, a user can now create a recording rule that adds metricA from tenantA to metricB from tenantB and writes the result to tenantC. This feature is experimental. For more information on how to use it, refer to federated rule groups.
Understand your metrics cardinality: Grafana Mimir adds two API endpoints to help users identify high-cardinality metrics. The
label_namesendpoint takes a metric name and returns all label names applied to that metric, as well as the count of values for each label name. When run without a metric name, it returns the highest-cardinality label names. The
label_valuesendpoint returns the highest-cardinality metrics, and can be used to get a count of how many series have a given label-value pair applied. The new custom tracker feature takes this one step further by allowing you to track the count of active series over time that match a specific label matcher.
These features are no longer considered experimental; they are stable features:
Cross tenant query federation is a feature that allows you to issue queries that aggregate series across multiple tenants, providing a way to get a global view of your data. Refer to
tenant-federation.enabledin the configuration parameters block to learn more.
Zone-aware replication protects against data loss due to an outage in a specific zone by ensuring data is replicated across multiple zones. In addition to promoting this feature to stable, we’ve also released a Kubernetes operator that makes it easier for those running Grafana Mimir on Kubernetes to manage multi-zone rollouts. Refer to Configuring zone-aware replication to learn more.
Azure Blob storage and OpenStack Swift support: Two additional object storage options are available. Grafana Mimir now supports AWS S3, GCS, Azure Blob Storage, OpenStack Swift, as well as any S3-compatible object storage.
Horizontally scalable Alertmanager: The Alertmanager can distribute its workload among multiple Alertmanager replicas, increasing the maximum number of tenants a cluster can comfortably handle. The Alertmanager uses the tenant ID to determine which replica(s) own which tenant. The replication factor is configurable, so alerts are replicated across multiple replicas, reducing the likelihood of a dropped alert.
Query-scheduler component: The query-scheduler component solves the scalability limitations of the query-frontend component. We strongly encourage users operating larger Grafana Mimir clusters to run this optional component. Refer to the query-scheduler to learn more about this component.
This information is relevant only to users migrating from Cortex to Grafana Mimir. Those who are not currently using Cortex today can follow our tutorial to Get Mimir up and running in 10 minutes.
Grafana Mimir is 100% compatible with Cortex, and the migration from Cortex to Mimir can be done without any data loss or downtime.
Grafana Mimir 2.0 is a major version upgrade, and therefore it introduces several breaking changes for those migrating from Cortex. To make this upgrade as easy as possible, we’ve added functionality to
mimirtool to automate the required configuration changes. We’ve paired this tooling with detailed migration guides that walk through all required changes.
For a demonstration that illustrates how easy this migration can be, please see the following video:
2.0.0 bug fixes
Version 2.0.0 fixes these bugs:
PR 424: Fixed a bug where cached results for step-aligned queries were used to respond to unaligned queries, causing the user to receive incorrect results. The new configuration parameter
query-frontend.cache-unaligned-requestssafely enables the caching of query results, even when the query step is not aligned.
PR 224: Fixed a race condition in the Alertmanager, which sometimes resulted in a tenant’s configuration being replaced with the blank fallback configuration.
PR 551: Fixed a bug in
memberlistthat caused large messages to be corrupted. This bug fix, in addition to the memberlist performance optimizations, enabled us to switch to
memberlistas the default store for Grafana Mimir’s hash rings.