Prometheus v2.11 Released

Published: 11 Jul 2019 RSS

Since graduating within CNCF last August, Prometheus has adopted a new schedule for releases every six weeks. The latest release, v2.11, arrived on July 9. Prometheus 2.11 includes a new option to compress WAL records using Snappy, query performance improvements, the option to use Alertmanager API v2, and more.

What’s New in Prometheus v2.11

You can download the latest version here.

Change in Metric Names

prometheus_tsdb_wal_reader_corruption_errors is now renamed to prometheus_tsdb_wal_reader_corruption_errors_total.

Additionally, a new metric called prometheus_http_requests_total was added. It’s a counter for HTTP requests made to Prometheus API.

Option to Use Alertmanager API v2

For users who have upgraded to the latest version of Alertmanager, this patch adds a configuration option for Prometheus to send alerts to the new API v2 endpoint instead of the default v1 endpoint.

InitContainers in Kubernetes Service Discovery

The new release added Kubernetes service discovery for initContainers. The original request for this feature posited that it would be useful to have them scraped because they may contain troubleshooting information, would help users detect problems, and could provide useful information about how long initialization takes.

Compression of WAL Records

This feature, built by Chris Marchbanks, provides the option to compress WAL records using Snappy, which reduces the size of WAL significantly.

Query Performance Improvement

I worked on this enhancement for efficient iteration and search in HashForLabels and HashWithoutLabels functions. It avoids the copying of slice data during iteration of labels, and uses a faster algorithm. We use the same query engine inside Cortex and noticed that these functions are taking longer than they should be. Once I dug into and optimized them, we saw some of our heavy queries take 30-40% less time. This is a good example where Cortex developments flow into upstream as enhancements!

Allocations in PromQL Aggregations Reduced

Thomas Jackson, who worked on this enhancement, reports that his benchmarks on aggregation showed that this reduced allocations by ~5%, with a ~10% time improvement.

Remote-Write Allocation Improvements

Chris Marchbanks improved allocations for remote write by removing an unneeded temp variable, only allocating pendingSamples once per shard, allocating the send slice far fewer times, using a mask rather than always copying samples into a temporary slice, and allocating a Snappy buffer per shard rather than creating one per request.

Impact on CPU fraction of Go GC

Regexp Matchers for Set Lookups

Goutham Veeramachaneni and I are mentoring a Google Summer of Code student, Alec Wang, who worked on this optimization of queries using regexp for set lookups. This makes the queries by Grafana dashboards faster because of how Grafana dashboards send queries to Prometheus.

Fix Unsafe Snapshots

A collaborative effort by Krasi Georgiev and Bartek Płotka, this fixed a bug in taking snapshots of Prometheus which includes head block.

Release Notes

Check out the release notes for a complete list of new features, enhancements, and bug fixes.

Download

Head to the download page for download links and instructions.

Related Posts

Five years in, the Prometheus open source project is as active as ever. Björn Rabenstein talks about what’s new -- and what’s next.
Check out three of the most popular questions—and answers!—on the Grafana Labs community board.
VictorOps' Melanie Postma on how teams can leverage VictorOps and Grafana dashboards to reduce MTTR.

Related Case Studies

DigitalOcean gains new insight with Grafana visualizations

The company relies on Grafana to be the consolidated data visualization and dashboard solution for sharing data.

"Grafana produces beautiful graphs we can send to our customers, works with our Chef deployment process, and is all hosted in-house."
– David Byrd, Product Manager, DigitalOcean

How Gojek is leveraging Cortex to keep up with its ever-growing scale

Gojek’s Lens monitoring system has 40+ tenants, for which Cortex handles about 1.2 million samples per second.

"The goal is to make sure that whenever a new service or team is created, they automatically get onboarded to the monitoring platform."
– Ankit Goel, Product Engineer, Gojek

How Grafana Cloud is enabling HotSchedules to develop next-generation applications

The visibility for all these metrics helps service delivery teams quickly iterate on new features.

"Grafana Cloud enables us to achieve observability bliss at HotSchedules. We don’t have to worry about scaling and maintaining the service."
– Denise Stockman, Director, Infrastructure, Hotschedules