Loki’s Path to GA: Adding Structure to Unstructured Logs

Published: 25 Jul 2019 by Ed Welch RSS

Launched at KubeCon North America last December, Loki is a Prometheus-inspired service that optimizes storage, search, and aggregation while making logs easy to explore natively in Grafana. Loki is designed to work easily both as microservices and as monoliths, and correlates logs and metrics to save users money.

Less than a year later, Loki has almost 6,500 stars on GitHub and is now quickly approaching GA. At Grafana Labs, we’ve been working hard on developing key features to make that possible. In the coming weeks, we’ll be highlighting some of these features. This post will focus on Loki’s pipeline stage.

From the beginning, one of the first and probably most requested feature sets for Loki has been around a common thread: manipulating log lines. There are many use cases for this, including extracting labels, extracting metrics, setting a timestamp from the log content, or manipulating the log line before it is sent to Loki.

In this post we will talk about our approach to solving this problem.

How It Works

There are currently two stages for extracting information from the logs with JSON and regex, both of which take the extracted data and put it into a metadata map. From here, there are several stages for using that data to set the timestamp, change the output, set labels, and/or extract metrics.

A good simple example would be processing Docker log lines. In the JSON format, Docker log lines look like this:

{"log":"some log message","stream":"stderr","time":"2019-04-30T02:12:41.8443515Z"}

We found it’s not generally useful to store this message in this raw JSON format, as the component pieces of the format can better be used directly. Here is an example of how you could process this line more effectively:

pipeline_stages:
  - json:
      output: log
      stream: stream
      timestamp: time
  - labels:
      stream: ‘’
  - timestamp:
      source: timestamp
      format: RFC3339Nano
  - output:
      source: output

Breaking this config down, first the json stage is used to extract the log, stream, and time JSON values into the extracted map. Then the stream value is set into a label named stream, followed by the timestamp being parsed according to the format provided and used as the log timestamp. Lastly the value of the log entry sent to Loki is modified with the output stage to just be the content of the log key in the original JSON.

Labels are really powerful in Loki, as they serve as the index to your log data. When you query your logs, you can use labels to improve your query performance; it reduces the amount of data fetched in order to show the query.

Keep in mind there are cardinality limits on labels. You don’t want your labels to have a lot of different values. For example, you usually log your data at a level, which indicates how important it is: info or debug; or warn or error, which would indicate a problem. You have a finite number of values, and that way you can query for, say, all the logs where the level equals error. That would give you the pertinent information around errors.

Detailed explanations of all the log stages and some more examples can be found in the docs.

Using these building blocks, you can do some fairly elaborate log parsing. Here is an example from one of our environments:

  pipeline_stages:
  - docker: {}
  - regex:
      expression: (level|lvl|severity)=(?P<level>\w+)
  - labels:
      level: ""
  - regex:
      expression: (?P<panic>panic:)
  - metrics:
      panic_total:
        config:
          action: inc
        description: 'total count of panic: found in log lines'
        source: panic
        type: Counter
  - match:
      selector: '{app="metrictank"}'
      stages:
      - regex:
          expression: \[(?P<component>Macaron)\]
      - template:
          source: component
          template: '{{ .Value | ToLower }}'
      - labels:
          component: ""
      - match:
          selector: '{component=""}'
          stages:
          - regex:
              expression: \[(?P<component>Sarama)\]
          - template:
              source: component
              template: '{{ .Value | ToLower }}'
          - labels:
              component: ""
      - match:
          selector: '{component=""}'
          stages:
          - regex:
              expression: '(?P<component>memberlist):'
          - labels:
              component: ""
      - match:
          selector: '{component=""}'
          stages:
          - template:
              source: component
              template: metrictank
          - labels:
              component: ""

This config makes use of the match stage in some fallthrough logic to break the Metrictank logs into one of several components, which is then set as a label allowing queries by a specified component, e.g. {app=”metrictank”,component=”memberlist”}

Also above, you can see we match all log lines which have a key=value type log level, and are looking for instances of golang panics to increment a counter.

The Log Processing Pipeline

Under the hood of the yaml examples above is a log processing pipeline. Realizing we would need a way to chain together different tools for processing log lines, we started with a common interface which would allow chaining together processing stages to form a pipeline:

type Stage interface {
    Process(labels model.LabelSet, extracted map[string]interface{}, time *time.Time, entry *string)
}

The idea is that any pipeline stage can manipulate the labels, timestamp, and/or output of the log line. We also realized there was a need for some ephemeral metadata useful during processes for which we created the extracted map. This is a way for pipeline stages to communicate data to each other and keep a small amount of state during log processing.

This interface allowed us to create many different types of stages and has thus far proven to handle almost all of the use cases we have seen so far, but like any good design, it looks like there is room for some iteration.

One use case we are discussing is splitting a single log line into many. Take this label, which has a complex value:

“group”: “syslog,sshd,invalid_login,authentication_failed,”,

It can be stored as is, and you can use regex pattern matching to search for authentication_failed, as an example:

{group=~”.+authentication_failed.+”}

However, what if that log line were replicated and each copy stored with the group label and each individual comma separated value? While this would create some unnecessary log line duplication, it would also simplify the group label and use of it, saving people from having to write regexs and making the auto-population of label values in Explore a lot more intuitive. This may be desirable, especially when you consider Loki’s compression will handle the duplication fairly efficiently.

Or another use case for splitting a log line into many might be storing the log line in its original unaltered format, but also extracting just the most commonly used piece of it separately and using labels to identify those:

{source=”original”}
{source=”abbreviated”}

The Stage interface, as currently implemented, does not gracefully handle such log line splitting, so we are considering alternatives for this; perhaps the four parameters in the interface could be put into a struct, and then we pass an array of these structs as the parameter to Process.

What’s Next?

We currently have a number of pre-built stages in the pipeline that cover most normal use cases and we ship a couple of preconfigured pipelines for working with Docker or CRI. The pipeline is generic in the sense that stages can be added for anything, and we are always on the lookout for new pipeline stages both internally and from the community to cover as many use cases as possible.

We are also constantly re-evaluating the pipeline design itself to add additional functionality, like the previously mentioned log line splitting, or to improve debugging (which can certainly be a little challenging at the moment).

Lastly we are also exploring and entertaining ideas on how people can share their log processing pipeline configurations, whether that be a curated list in the Loki repo or a more elaborate setup where you can configure Promtail to point at a “repository” of some sort to download shared configurations.

Check out the docs to see how you can get started manipulating and extracting data from your log lines.

More about Loki

In other blog posts, we focus on key Loki features, including loki-canary early detection for missing logs, the Docker logging driver plugin and support for systemd, and query optimization. Be sure to check back for more content about Loki.