I’m very excited to tell you all about the latest Grafana Loki installment, 2.5! A huge amount of work, nearly 500 PRs, has gone into Loki between v2.4 and now. The major themes for this release are improved performance, continuing ease of operations, and more ways to ingest your logs.
We’ve also released a new version of our self-hosted, Enterprise-ready version of Loki, Grafana Enterprise Logs (GEL). GEL 1.4 is built off of Loki 2.5, which means it inherits all of the features, enhancements, and bug fixes that I’ll talk about below. For simplicity, I’ll refer to Loki 2.5 throughout the post, but anything I refer to for Loki 2.5 also applies for GEL 1.4.
I usually find myself the most excited about performance improvements, so let’s start there.
Better performance in Grafana Loki
After debugging some high CPU usage on queries with regular expressions, Bryan Boreham took a deep dive into the Go regular expression package and came back with a number of optimizations to improve Go regex performance. Bryan has a great deep dive talk he gave at FOSDEM 2022 if you’d like to learn more!
Something that has been nagging at us for a while: Why did it seem like binary operations in Loki were so slow? Something as simple as dividing a query by 1,000 could result in a seemingly much slower result. We hadn’t prioritized looking into this issue until recently when Owen Diehl took a look and found a number of issues and optimizations which have significantly improved the performance of any query with a binary operation.
Originally introduced into Grafana Tempo to curtail the long tail latency of queries, we’ve added this functionality to Grafana Loki now too. It’s currently disabled by default, but you can enable hedged requests in configuration.
A new storage schema for S3
We’ve seen and heard issues with rate limits and S3 so to better support high volume queries against S3 object stores, Grafana Loki 2.5 introduces a new V12 storage schema. Storage schemas control how Loki stores data in object stores. In short, with this change, the V12 uses more “prefixes” to store chunks, which allows more performance against S3 storage while avoiding the per prefix rate limits imposed by S3.
From our testing, we were able to push read requests beyond the 5,500 requests per second limit:
Check out the documentation for information on changing schemas.
Ingesting more logs in Loki
In Grafana Loki 2.5, there are several more ways to get logs into Loki now via the Promtail agent:
Service discovery and API tailing directly from the Docker Daemon
This can be used in a few ways: You can do service discovery with the file target in Promtail similar to how we typically do Kubernetes tailing where the Docker daemon can give you the unique container IDs which can be used to construct a path to the on-disk log files.
However, it’s also possible to configure a new scrape target which can pull logs directly from the Docker Daemon’s API, which removes any need to directly access files on disk and can make it a lot easier to configure and run Promtail in Docker.
Fetching logs directly from Cloudflare
Receiving Graylog Extended Log Format (GELF) directly in Promtail
Promtail can now expose a port to receive GELF messages directly via UDP. This is useful for anyone with this existing infrastructure or via some third-party services which expose this as an export format, but don’t natively support Loki (yet).
Simple Scalable Deployment (SSD) improvements and new Helm charts
In Grafana Loki 2.4, we introduced the SSD mode of operation which is a middle ground between running the single binary Loki and full-fledged microservices. We are extremely happy with the SSD mode, and it’s really looking like the future of Loki operations, making it easier to run Loki at larger scales both in cloud native Kubernetes installations but also directly on machines, no Kubernetes required.
For those of you who are in Kubernetes and want to try it out, there are new Helm Charts for the SSD mode. For our Enterprise users, we’ve also added a new Helm Chart for deploying Grafana Enterprise Logs in SSD mode.
If you want to see how to run Loki and GEL in SSD mode in Docker, check out this blog post about Grafana Loki and Grafana Enterprise Logs by Trevor Whitney.
Grafana Loki 2.5 includes code we added to report anonymous usage statistics back to Grafana Labs. An issue was created to outline the intent of this addition, and what went into the final implementation can be seen here in the source.
Usage reporting helps provide anonymous information on how people use Loki and what the Loki team should focus on for features and documentation. No private information is collected, and all reports are completely anonymous.
If possible, we ask you to leave the usage reporting feature enabled and help us understand more about Loki! We are also working on figuring out how we can share this info with the community so everyone can watch Loki grow.
If you would rather not participate in usage stats reporting, the feature can be disabled in config:
analytics: reporting_enabled: false
Thanks to all the Grafana Loki users and contributors out there who continue to help grow the project. Our team is growing fast to keep up with the demand, and we’re all incredibly excited about the future of Loki. In fact, we already have big plans to further improve Loki’s storage, user experience, and operating experience to make sure it stays at the top of the object storage logging solutions.
If you run Grafana Enterprise Logs, be sure to check out the release notes and documentation for GEL 1.4, which is based off of Loki 2.5, so you can enjoy all the amazing changes made to the upstream OSS project!
For a full demo of the latest features in Grafana Loki 2.5, join us for our “Getting started with logging and Loki” webinar on April 20. Register for free today!