Loki’s Path to GA: Live Tailing
Launched at KubeCon North America last December, Loki is a Prometheus-inspired service that optimizes storage, search, and aggregation while making logs easy to explore natively in Grafana. Loki is designed to work easily both as microservices and as monoliths, and correlates logs and metrics to save users money.
Less than a year later, Loki has almost 6,500 stars on GitHub and is now quickly approaching GA. At Grafana Labs, we’ve been working hard on developing key features to make that possible. In the coming weeks, we’ll be highlighting some of these features. This post will focus on Loki’s live tailing feature.
How It Works
Live tailing is one of the most requested features for Loki. It allows you to see the logs as they come into the system, so it’s valuable for all levels of aggregation systems. One common use case is aggregating and replaying logs from multiple services as they are pushed to Loki, which gives you the live state of the services and helps you catch bugs quickly as they appear.
Ingesters in Loki receive pushed logs and store the logs in storage and index them. Live tailing opens a websocket with one of the queriers, which tails logs from all the ingesters. With microservice-based systems, you have to assume that some microservices fail, which in this case can be ingesters. Live tailing is fail-safe: It can recover from a failing ingester and makes sure that it is tailing all the ingesters all the time.
We use gRPC streams for tailing live logs from all the ingesters. gRPC streams use HTTP/2 streams, which are lightweight and multiplexed over a single HTTP connection. That makes communication between queriers and ingesters efficient.
Loki supports all the filters and expressions of a normal query to see only the logs that you’re interested in seeing – for example, everything from the same namespace, or anything marked “error.” Unlike a query, though, it keeps pushing the logs forever, until you pull the request.
Loki also allows you to start tailing the current log and also see some of the historic logs for more context. You can query the logs for, say, a one-hour period before the current time, and then all currently incoming logs with the same live tailing request.
There’s also Log CLI, a command line tool that you can use to query and tail logs. You can do cool things with it like get logs in json format, pipe it to another command or a file.
By default, live tailing gives logs as they come in, which can result in unordered logs between different streams. For instance, if you have two services sending the logs to Loki, and one of them lags behind in pushing the log, then you may see logs out of order between those streams. We allow accumulation of logs for up to 5 seconds using
delay-for cli flag, which accumulates logs for a given duration and helps in re-ordering the logs from all the streams so that they can be seen in time order. Support for it will likely be added in Grafana Explore.
To view your logs live as they are added, choose “Live” from the refresh dropdown, and you should see your logs be displayed in real time.
Note that live tailing relies on two websocket connections: one between the browser and the Grafana server, and another between the Grafana server and the Loki server. If you run any reverse proxies, please configure them accordingly.
More about Loki
In other blog posts, we focus on key Loki features, including loki-canary early detection for missing logs, the Docker logging driver plugin and support for systemd, adding structure to unstructured logs with the pipeline stage, and query optimization. Be sure to check back for more content about Loki.