The concise guide to labels in Loki

Published: 27 Aug 2020 RSS

A few months ago, I wrote an in-depth article describing how labels work in Loki. Here, I’m consolidating that information into a more digestible “cheat sheet.”

There are some big differences in how Loki works compared to other logging systems which require a different way of thinking. This is my attempt to convey those differences as well as map out our thought process behind them.

As a Loki user or operator, your goal should be to use the fewest labels possible to store your logs.

Fewer labels means a smaller index which leads to better performance.

I think this is worth repeating: Fewer labels = better performance.

This likely sounds counterintuitive. I know my experience with databases has taught me that if you want it to be fast, you need to index it. Loki is built and optimized in the exact opposite way. The design goals around Loki are to keep operating costs and complexity low, which is accomplished by keeping a very small index and leveraging commodity hardware and parallelization.

So as a user or operator of Loki, always think twice before adding labels.

Examples

ts=2020-08-25T16:55:42.986960888Z caller=spanlogger.go:53 org_id=29 traceID=2612c3ff044b7d02 method=Store.lookupIdsByMetricNameMatcher level=debug matcher="pod=\"loki-canary-25f2k\"" queries=16

How can I query all my logs for a given traceID?

You might think, “I should extract traceID as a label,” and then I can query like this:

{cluster="ops-cluster-1",namespace="loki-dev", traceID=”2612c3ff044b7d02”}

Never do this! Avoid extracting content from your logs into labels. If you want to find high cardinality data in your logs use filter expressions like this:

{cluster="ops-cluster-1",namespace="loki-dev"} |= “traceID=2612c3ff044b7d02”

But what if the label is low cardinality? What if you extracted the log level into a label, and we only have five values for our logging level?

{cluster="ops-cluster-1",namespace="loki-dev", level=”debug”}

Be careful here! Remember labels have a multiplicative effect on the index and storage. What started as one log stream has now turned into as many as five streams. Then consider if you add another label; even if it only has a few values, things can quickly get out of control:

<Multiplicative streams in Loki>

Instead, use filter expressions:

{cluster="ops-cluster-1",namespace="loki-dev"} |= “level=debug” |= “status=200” |= “path=/api/v1/query”

But if I want to write a metric query and I want to add a sum by (path), how can I do this if path isn’t a label?

Ah, you got me here! Currently you can only aggregate on labels; HOWEVER, that’s not for long! Coming soon in v2 of Loki’s query language, LogQL, we will support extracting log content into query time labels which can be used for aggregations! Have a look at a preview here.

Is it ever ok to extract log content into labels?

Yes, but please think very carefully before you do so. Labels describe your logs. They help you narrow down the search. They are not intended to hold log content itself and they are never intended to be used to locate an individual line. Another way to think of labels is that they describe your environment or the topology of your applications and servers (i.e. where your logs came from).

But when I send logs from my Lambda or function I get “out of order” errors unless I include a request ID or invocation ID?

This is currently a tough use case for Loki. We are working hard to remove the limitation on ordered entries, but this is a tricky problem. For now, this kind of environment will require limits on how much parallelism your functions have. Or another workaround involves a fan-in approach of sending your function logs to an intermediate Promtail instance which can do ingestion timestamping.

This is an area where Loki needs improvement, and we are actively working on this.

Summary

Loki leverages horizontal scaling and query time brute force to find your data. Is this as fast as a fully indexed solution? No, it’s probably not! But it’s a heck of a lot easier to run (and still very fast)!

Let’s look at some actual data from one of Grafana Lab’s Loki clusters. In the last seven days, it ingested 14TB of data. The corresponding index usage for that time period is about 500MB; the index for 14TB of logs could fit in the RAM of a Raspberry Pi.

This is why we focus on keeping the label set small. Maybe your labels can only narrow down your search to 100GB of log data — no worries! It’s a lot cheaper to run 20 queriers which can parallelize searching that 100GB of data at 30GB/s than it is to maintain a 14TB index that can tell you exactly where to look, especially when you consider you can turn them off when you are done.

So once more, with feeling: Fewer labels = better performance.

Related Posts

Everything you need to know about how labels really work in Loki, Grafana Labs' open source log aggregation system. It might be different that you thought!
A complete guide to forwarding logs and workload metadata from Amazon's Elastic Container Service to Loki. The result? All your logs can be queried in Grafana.
People in the community have long used Grafana and NGINX together. A new partnership is focused on delivering an experience that allows them to continue to innovate on top of the tools.