New feature in Loki 2.4: no more ordering constraint

• 3 Dec, 2021 • 8 min

A new version of Loki was released back in November, and I’m here to talk about one of its most exciting features. Loki 2.4 finally removed the requirement that all data must be ingested in timestamp-ascending order. Instead, Loki now allows out of order logs up to a configurable validity window (more to come on that). In this post, I’ll walk through what all this means and why we’re thrilled about it.

If you’re interested in learning more and want to see a demo from the Loki team, check out our ObservabiliityCON session.

Brief background

Loki stores compressed logs in object storage, which helps maintain a very low-cost operating model. (There’s a ton of work done behind the scenes to ensure we’re still fast without optimized schemas and memory hungry storage nodes, but that’s a topic for another post.) Rather than immediately writing every log line to storage as it’s received, we buffer it in memory to flush in larger chunks later. This is called write deamplification, and it allows us to lessen the number of writes we need. This is particularly helpful as a cost-reduction technique when your storage is billed by operation instead of just size. Loki buffers these logs in a subcomponent called the ingester.

Before we look at the inner workings of the ingester and the design choices we’ve made to enable unordered writes in Loki, it’s important to understand a bit about Loki’s data model.

In Loki, logs are grouped into “streams,” which are a unique collection of label names and values. For example, this is a unique set of labels which determines a “stream.” It’s generally analogous to a log file:

{application="api", cluster="us-central", environment="prod"} =>

And these are the collection of logs associated with this stream. (Note they’re in increasing timestamp order!)

[
(timestamp_0, "line 0"),
(timestamp_1, "line 1"),
(timestamp_4, "line 2"),
]

Trying to append a log line with a previous timestamp, such as (timestamp_2, "oops old log line"), would cause an out_of_order error, and the line would be discarded instead of stored. This is the behavior that’s changing in Loki 2.4.

And now, onto how it works.

Our constraints

So what are we really out to solve by relaxing the ordering constraint — and who benefits? While removing constraints by definition makes things easier, I’m going to detail a few specific ways in which this change addresses issues faced by end users as well as third-party developers.

End users: Requiring in-order logs may be easy for simple cases like tailing a log file that is always in order, but it can get tricky when utilizing an aggregation layer. These sort of fan-in deployments are fairly common, where a “dumb” agent may forward raw log files to a centralized aggregator that can apply specialized mapping, etc., and in turn forward the files to Loki. Suddenly, the post-aggregated logs may no longer be in order, and Loki has unintentionally foisted the responsibility of maintaining these complex and error-prone configurations onto this intermediate layer.

Third-party developers integrating with Loki: The Loki team maintains the promtail agent, but there are many others which can send logs to Loki. Sometimes they’ll add optimizations like batching and retries. When those are under load, they can cause a newer batch to be accepted before an older batch, resulting in Loki rejecting the older batch. This, too, is a difficult and annoying problem to debug.

I should also mention the problematic intersection of ordering and cardinality (how many distinct log streams exist). Sometimes, ingesting high-cardinality data sources can be incredibly inefficient, like in the case of introducing a label with a high or unbounded number of values, such as request_id. Trying to fix this often ends up with out of order logs, overwritten timestamps, or cardinality problems. I won’t go too far down this rabbit hole, but check out my previous post which inspired this example, as well as Ed Welch’s fantastic guide to labels for more in depth reading.

How we got here

This new Loki feature started more than a year ago as a thought experiment between me and Ed (@slim-bean) over a few beers after work. We wondered, what if we could remove or relax the ordering constraint in Loki? It would make Loki drastically simpler to use and integrate with — but how would it work?

I’ve mentioned the ingester component buffers logs before they’re flushed to storage. In order to prevent data loss, Loki creates multiple replicas of the data and stores it on different ingesters, so that if one fails, other copies are available. As a result, there is on average more than one copy of the data in storage, grouped into a different “chunk” (collection) of compressed logs. Loki, in turn, is already capable of handling overlapping/duplicated logs when querying the storage.

Getting back to the ordering problem, this means that as long as we re-order the logs into ordered “chunks” before sending them to storage, the other subcomponents in Loki can handle reading them just fine.

This isolated our problem to just the ingesters. So how do we efficiently allow unordered writes there? We need to account for two principles:

1. Memory pressure (space) Ingesters buffer logs in memory, which means they’re incredibly sensitive to how big the in-memory representation is. Logs are accumulated into smaller blocks and individually compressed, then added together to create a chunk, which is sent to storage. Whichever way we insert logs in memory, we’ll want to keep space in mind as memory is expensive.

Historically, writes were accepted in monotonically increasing timestamp order to a headBlock, which is occasionally “cut” into a compressed, immutable block. In turn, these blocks are combined into a chunk and persisted to storage.

    Data while being buffered in Ingester          |                                Chunk in storage
                                                   |
    Blocks                    Head                 |       ---------------------------------------------------------------------
                                                   |       |   ts0   ts1    ts2   ts3    ts4   ts5    ts6   ts7    ts8    ts9  |
--------------           ----------------          |       |   ---------    ---------    ---------    ---------    ---------   |
|    blocks  |--         |  head block  |          |       |   |block 0|    |block 1|    |block 2|    |block 4|    |block 5|   |
|(compressed)| |         |(uncompressed)|          |       |   |       |    |       |    |       |    |       |    |       |   |
|            | | ------> |              |          |       |   ---------    ---------    ---------    ---------    ---------   |
|            | |         |              |          |       |                                                                   |
-------------- |         ----------------          |       ---------------------------------------------------------------------
  |            |                                   |
  --------------                                   |

Due to Loki’s ordering constraint, these blocks maintain a monotonically increasing timestamp (abbreviated ts) order where:

start       end
ts0         ts1          ts2        ts3
--------------           --------------
|            |           |            |
|            | --------> |            |
|            |           |            |
|            |           |            |
--------------           --------------

This allows us to store much more data in memory because each block is compressed after being cut from a headBlock. We still use this pattern after the removal of the ordering constraint to reduce memory pressure, although inter-block and inter-chunk ordering are no longer guaranteed. So how can we still ensure unordered writes and reads are fast?

2. Speed (time) When ingesters only accepted logs in order, it was easy to iterate over them in memory. We’ll need to keep this in mind as well to ensure our read latencies don’t suffer.

We looked at quite a few data structures which could be inserted in arbitrary order, but efficiently traversed in order for queries. This eventually led us to a choice between skip list- and balanced binary tree-based approaches. The skip list proved less attractive due to its space amplification costs. (Remember, we value minimizing memory.)

Ultimately we chose the tree — specifically a one-dimensional range tree, which benchmarked better. However, any approach would be less efficient than the append-only benefits that the ordering constraint previously guaranteed. The questions became: How much were we going to pay in performance costs for this increased functionality? And what can we do to minimize it?

Remember, Loki serializes writes into compressed blocks before grouping them into chunks. When inter-block ordering is guaranteed (as with the ordering constraint), unneeded blocks can be skipped while querying. For data that arrives in order, Loki still applies this optimization. Furthermore, chunks composed of ordered blocks don’t need to be re-ordered before flushing to storage. These two benefits allow Loki to continue operating on ordered data with nearly no loss of performance. Highly unordered data will need to check more blocks during queries and be re-ordered prior to flushing to storage, but this hasn’t yet been noticeable in any of our workloads, from development through production.

Exposure

Unordered writes has become the default mode of operation in Loki 2.4, meaning no additional configuration or upgrade procedures will be required. This will immediately simplify many of the edge cases around ingesting data into Loki, and present an easier, more enjoyable experience. This may be overridden on a per-tenant basis by toggling unordered_writes: <bool> | default = true.

Loki will accept out of order logs as far back as highest_timestamp_written - ingester.max-chunk-age / 2, also known as the validity window. This means that for a max_chunk_age of 2h, Loki will accept out of order writes up to 1h earlier than the highest written timestamp for that stream. That is done to ensure there aren’t any unexpected storage costs due to writing one line every hour for the past week, which requires Loki to create many chunks with a single log each.

This new feature has been tremendously exciting for us to design and implement, and it has enabled a host of new use cases while removing barriers to entry for existing ones. The turn-key nature of this feature will improve the Loki experience, which is something we’re committed to, especially as the project grows.

We hope you enjoy Loki 2.4! Please keep in touch and let us know what you think, whether it’s on GitHub, our #loki channel on the Grafana Labs community Slack, or in our community calls!

The easiest way to get started with Grafana Loki is Grafana Cloud, with free and paid plans to suit every use case. If you’re not already using Grafana Cloud, sign up for free today.

Feedback

New feature in Loki 2.4: no more ordering constraint

Brief background

Our constraints

How we got here

Exposure

Related content

Feedback

New feature in Loki 2.4: no more ordering constraint

Brief background

Our constraints

How we got here

Exposure

Related content

How sparse histograms can improve efficiency, precision, and mergeability in Prometheus TSDB

Grafana 8.2.3 released with medium severity security fix: CVE-2021-41174 Grafana XSS

What I've learned managing engineers — and my own work-life balance — during the pandemic