How Cortex uses the Prometheus Write-Ahead Log (WAL) to prevent data loss

Published: 29 Apr 2020

Since the beginning of the Cortex project, there was a flaw with the ingester service responsible for storing the incoming series data in memory for a while before writing it to a long-term storage backend. If any ingester happened to crash, it would lose all the data that it was holding. While there is replication to take care of this issue, if one ingester could crash, the same bug could cause all the other ingesters to crash – and all the data in a time frame would be lost until we fix it or roll back to an old version.

This was fixed when we introduced a Write-Ahead Log (WAL) similar to Prometheus’ TSDB in January of this year. With WAL, whenever an ingester gets a write request, it logs this event into a file along with storing it in the memory. Then, if an ingester happens to crash, it can replay these events on the disk and restore the in-memory state that it had before crashing. We use the Prometheus WAL package to manage writing and reading these events on the disk.

In heavy load ingesters, the WAL replay is usually slower than what Cortex would require. So along with WAL, the ingester writes a snapshot of all the data in its memory to disk (in the form of chunks, which has multiple samples compressed into a single blob) at regular intervals, called a “checkpoint.” A checkpoint is faster to replay as it restores a chunk (up to 6h of data) at a time, while replay of WAL is a sample at a time. Finally after this, the replay consists of replaying the checkpoint and the remaining WAL that is not included in the checkpoint.

When this feature was added, it was marked as experimental, as it was not battle-tested yet. After some rigorous testing and some enhancements like spreading the checkpoint writes for evenly distributed disk writes and switching over to Prometheus TSDB WAL record format, it is ready to shed its experimental tag. The tag will be officially removed in a couple of weeks.

If you would like to migrate to using WAL (or starting with Cortex :)), have a look at the production guide on WAL for more information.

Interested in finding out more about Cortex?

Check out the on-demand recording of Grafana Labs’ recent Taking Prometheus to Scale with Cortex webinar featuring Cortex co-creator Tom Wilkie and maintainer Goutham Veeramachaneni.

Related Posts

On March 16, Grafana Cloud’s Hosted Prometheus service experienced a 12-minute outage. Here's our incident postmortem.
Wondering how Grafana Labs customers are benefiting from using Grafana and hosted Prometheus/Cortex? Here are four success stories.
Read our latest team profile on Matt Mendick, who manages Grafana Labs' team of backend engineers.