Prometheus v2.11 Released
Since graduating within CNCF last August, Prometheus has adopted a new schedule for releases every six weeks. The latest release, v2.11, arrived on July 9. Prometheus 2.11 includes a new option to compress WAL records using Snappy, query performance improvements, the option to use Alertmanager API v2, and more.
What’s New in Prometheus v2.11
- Change in Metric Names
- Option to use Alertmanager API v2
- InitContainers in Kubernetes Service Discovery
- Compression of WAL records
- Query performance improvement
- Allocations in PromQL aggregations reduced
- Remote-write allocation improvements
- Regexp matchers for set lookups
- Fix unsafe snapshots
You can download the latest version here.
Change in Metric Names
prometheus_tsdb_wal_reader_corruption_errors is now renamed to
Additionally, a new metric called
prometheus_http_requests_total was added. It’s a counter for HTTP requests made to Prometheus API.
Option to Use Alertmanager API v2
For users who have upgraded to the latest version of Alertmanager, this patch adds a configuration option for Prometheus to send alerts to the new API v2 endpoint instead of the default v1 endpoint.
InitContainers in Kubernetes Service Discovery
The new release added Kubernetes service discovery for initContainers. The original request for this feature posited that it would be useful to have them scraped because they may contain troubleshooting information, would help users detect problems, and could provide useful information about how long initialization takes.
Compression of WAL Records
This feature, built by Chris Marchbanks, provides the option to compress WAL records using Snappy, which reduces the size of WAL significantly.
Query Performance Improvement
I worked on this enhancement for efficient iteration and search in
HashWithoutLabels functions. It avoids the copying of slice data during iteration of labels, and uses a faster algorithm. We use the same query engine inside Cortex and noticed that these functions are taking longer than they should be. Once I dug into and optimized them, we saw some of our heavy queries take 30-40% less time. This is a good example where Cortex developments flow into upstream as enhancements!
Allocations in PromQL Aggregations Reduced
Thomas Jackson, who worked on this enhancement, reports that his benchmarks on aggregation showed that this reduced allocations by ~5%, with a ~10% time improvement.
Remote-Write Allocation Improvements
Chris Marchbanks improved allocations for remote write by removing an unneeded temp variable, only allocating pendingSamples once per shard, allocating the send slice far fewer times, using a mask rather than always copying samples into a temporary slice, and allocating a Snappy buffer per shard rather than creating one per request.
Regexp Matchers for Set Lookups
Goutham Veeramachaneni and I are mentoring a Google Summer of Code student, Alec Wang, who worked on this optimization of queries using regexp for set lookups. This makes the queries by Grafana dashboards faster because of how Grafana dashboards send queries to Prometheus.
Fix Unsafe Snapshots
Check out the release notes for a complete list of new features, enhancements, and bug fixes.
Head to the download page for download links and instructions.
Related Case Studies
The company relies on Grafana to be the consolidated data visualization and dashboard solution for sharing data.
For Hiya, one of the key selling points was the fact that Grafana Cloud is powered by Cortex.