Avoid dropped logs due to out-of-order timestamps with a new Loki feature
This blog post was written with Senior Technical Writer Karen Miller.
Dropped log lines due to out-of-order timestamps can be a thing of the past! Allowing out-of-order writes has been one of the most-requested features for Loki, and we’re happy to announce that in the upcoming v2.4 release, the requirement to have log lines arrive in order by timestamp will be lifted. Simple configuration will allow out-of-order writes for Loki v2.4. If you’re using Grafana Cloud, our fully managed platform integrating metrics, logs, traces, and dashboards, out-of-order writes are already accepted.
The restriction and its effects
From Loki’s inception as a memory-efficient log aggregation tool, it has required that log entries within a stream arrive in time order. Log entries written to Loki in violation of the ordering requirement are dropped.
- timestamp=“10/02/2021 07:00:06 -0700” app=“widget”, env=“prod”, cluster=“eu-west2”
- timestamp=“10/02/2021 07:00:08 -0700” app=“widget”, env=“prod”, cluster=“eu-west2”
- timestamp=“10/02/2021 07:00:05 -0700” app=“widget”, env=“prod”, cluster=“eu-west2”
- timestamp=“10/02/2021 07:00:10 -0700” app=“widget”, env=“prod”, cluster=“eu-west2”
If these log lines were to arrive in the given order, the third log line (highlighted) is out of order, as its timestamp is three seconds earlier than the second timestamp. Loki will drop that out-of-order log line.
Requiring the in-order arrival of log lines is not so bad when Promtail is the agent that tails the logs of a standalone app. But if you introduce multiple instances of the same app or add some complexity to the system, those out-of-order log lines become an issue.
Here are examples of problematic setups:
- Agents other than Promtail may not account for ordering constraints.
- In systems with high load or in systems with retries configured, it’s common for data to be sent slightly out of order. There may be only seconds involved, but the log lines can still arrive out of order.
- In all but the most carefully architected processing pipelines, where a set of source collectors fan into aggregators or processors of logs, data is often sent slightly out of order.
- Lambdas (ephemeral jobs) are another great example. Their short lifetimes and high parallelism make it hard to work with an ordering constraint. Implementations squeeze between a rock and a hard place. They can add labels that bloat the index, they can carefully construct a set of Promtail agents, or they can deal with the consequences of dropped log lines. I wrote a blog post detailing the issues and solutions.
Configuring out-of-order writes
With Loki v2.4, you can lift the restriction on ordering with simple configuration. Loki will accept unordered data with negligible performance cost and continue to apply optimizations to pre-ordered data where possible.
You can find the configuration details in the documentation. The restriction can be lifted for an entire Loki cluster, or it can be lifted on a tenant-by-tenant basis. A configurable time window allows you to define how distant from the current time an out-of-order log line can be and still be accepted.
For even more detailed information, read the design document.
Try it out now!
Right now you can try out this functionality in Grafana Cloud; out-of-order writes up to one hour old are accepted. We have free and paid Grafana Cloud plans to suit every use case — sign up for free now.