We’re making Prometheus use less memory and restart faster
A few months ago, I blogged about memory-mapping of full chunks of the head block from disk. The feature, which was introduced in Prometheus v2.19.0, brings down memory usage and restart time.
Additionally, there’s another Prometheus feature in progress that snapshots in-memory data during shutdown for faster restarts; it’s expected to cut down the restart times by a big factor.
These two features formed the core of my KubeCon + CloudNativeCon EU talk, entitled Make Prometheus Use Less Memory and Restart Faster. Have a look at the talk to learn more!
Here’s a brief recap of the features:
Memory-mapping full chunks of head block offloads the chunks to disk regularly and only loads them in the memory when required — hence effectively reducing the memory usage. And with having chunks on disk, we can skip a lot of samples from the WAL during replay and reduce the restart time. If you are interested in knowing more low-level details of this feature, you can have a look at this blog post.
After memory mapping, we still have some data in the memory. With snapshotting, we dump this remaining in-memory data to the disk during shutdown. During startup, we can restore the in-memory state with this snapshot and the memory-mapped chunks, while skipping the WAL entirely. This brings down the restart time by a big factor, as WAL replay is the slowest part of the restart.
This work is still in progress and some details might change. We will be blogging about this as soon as the feature is in, so don’t forget to check back for updates!