Loki 1.6.0 released: Metric query performance up to 10x faster, push logs from any client to Promtail, query language and LogCLI enhancements, and more!

Published: 13 Aug 2020 RSS

Things have been busy with the Loki project! Once again, we waited too long between releases, and there are so many new things I won’t be able to list them all. But that won’t stop me from trying, so let’s get to it.

For a change of pace, instead of listing interesting PRs, I’m going to talk through Loki’s components and mention the changes in more of a paragraph style. Let’s see how this goes.

But first, congratulations to @adityacs, who is the newest member of the Loki team! Aditya has been regularly contributing to the Loki project for the past year, with each contribution better than the last. Many of the items on the following list were thanks to his hard work. Thank you, Aditya, and welcome to the team!

Digging into the long list of features, starting with Loki, let’s talk about additions to the query language. PR 2150 introduces bytes_rate, which calculates the per second byte rate of a log stream, and bytes_over_time, which returns the byte size of a log stream. PR 2182 introduces a long list of comparison operators, which will let you write queries like count_over_time({foo="bar"}[1m]) > 10. Check out the PR for a more detailed description.

New query function bytes_rate and binary operators like > 1000

Many performance improvements have been made and can be summarized into two categories. The first are improvements to Loki itself:

PR 2216, PR 2218, and PR 2219 all improve how memory is allocated and reused for queries.

PR 2239 is a huge improvement for certain cases in which a query covers a large number of streams that all overlap in time. Overlapping data is now internally cached while Loki works to sort all the streams into the proper time order.

Saving the best for last, PR 2293 was a big refactor to how Loki internally processes log queries vs. metric queries, creating separate code paths to further optimize metric queries. Metric query performance is now 2 to 10 times faster.

The second class of performance improvements apply to the query frontend. Anyone running it will see benefits from PR 2441, which improves how label queries can be split and queried in parallel; PR 2123, which allows queries to the series API to be split by time and parallelized; and last but most significant, PR 1927, which allows for a much larger range of queries to be sharded and performed in parallel. Query sharding is a topic in itself, but as a rough summary, this type of sharding is not time dependent and leverages how data is already stored by Loki to be able to split queries up into 16 separate pieces to be queried at the same time.

A couple other notable mentions: PR 2453 improves the error messages when a query times out, as Context Deadline Exceeded wasn’t the most intuitive. PR 2336 provides two new flags that will print the entire Loki config object at startup. Be warned there are a lot of config options, and many won’t apply to your setup (such as storage configs you aren’t using), but this can be a really useful tool when troubleshooting. Sticking with the theme of best for last, PR 2224 and PR 2288 improve support for running Loki with a shared Ring using memberlist while not requiring Consul or Etcd. We need to follow up soon with some better documentation or a blog post on this!

Not to be outdone by Loki, Promtail has a long list of exciting new features. To me the most exciting is PR 2296, which allows Promtail to expose the Loki Push API. With this, you can push from any client to Promtail as if it were Loki, and Promtail can then forward those logs to another Promtail or to Loki. There are some good use cases for this with the Loki Docker Logging Driver; if you want an easier way to configure pipelines or expose metrics collection, point your Docker drivers at a Promtail instance. Another use case would be combining this feature with PR 2282, which contains an example Amazon Lambda where you can use a fan-in approach and ingestion timestamping in Promtail to work around out of order issues with multiple Lambdas processing the same log stream. This is one way to get logs from a high-cardinality source without adding a high-cardinality label.

The Promtail Pipeline also has several very useful new stages. PR 2060 introduces the Replace stage, which lets you find and replace or remove text inside a log line. Combined with PR 2422 and PR 2480, you can now find and replace sensitive data in a log line like a password or email address and replace it with ****, or hash the value to prevent readability, while still being able to trace the value through your logs. Last on the list of pipeline additions, 2496 adds a Drop pipeline stage, which lets you drop log lines based on several criteria options including regex matching content, line length, or the age of the log line. The last two are useful to prevent sending to Loki logs that you know would be rejected based on configured limits in the Loki server.

Other client/agent news: PR 1822 added a Logstash output plugin for Loki. If you have an existing Logstash install, you can now use this plugin to send your logs to Loki to make it easier to try out, or use Loki alongside an existing logging installation.

Moving on to the Loki Canary: I won’t go into too much detail because this will be explained in a future blog post, but the canaries are now much more aggressive about checking for data integrity, including spot checking for logs over a longer time window and running metric queries to verify count_over_time accuracy. Details can be found in PR 2344.

Loki-canary testing more logs and exporting more metrics for better visibility

The last thing I would like to call some attention to are some really slick additions to the command line query tool LogCLI. PR 2470 allows you to color code your log lines based on their stream labels for a nice visual indicator of streams.

LogCLI supports color coding labels for different log streams

PR 2497 expands on the series API query to Loki with the--analyze-labels flag, which can show you a detailed breakdown of your label key and value combinations. This is very useful for finding improper label usage in Loki or labels with high cardinality. Rounding out this release is a very useful capability added in PR 2482, in which LogCLI will automatically batch requests to Loki to allow making queries with a --limit= far larger than the server side limit defined in Loki. LogCLI will dispatch the request in a series of queries configured by the --batch= parameter (which defaults to 1000) until the requested limit is reached!

I think this is a good place to stop summarizing, but there were many more awesome things added: In total, 187 PRs made it into this release! Thank you everyone!

For the full list please check out the CHANGELOG.

One final note: Please take a moment to read the upgrade guide. We continue to work very hard to make operating Loki a smooth experience. However, as the page says, software is hard, and sometimes we need your help to keep Loki from bearing the burden of mistakes and improvements, which are sometimes difficult to make seamlessly.

Related Posts

If you're using Amazon Elastic Kubernetes Service, this setup will allow you to query all your logs in one place.
Amazon's EC2 is one of the most popular ways to run applications in the cloud. Here's how to set up Promtail to send logs to Loki and gain visibility in your cloud deployment.
Loki 1.4.0 is out! Check out the new features and improvements, including metric math in LogQL queries and help with setting up and debugging pipeline stages.