Blog  /  Engineering

New in Loki 2.3: LogQL pattern parser makes it easier to extract data from unstructured logs

Cyril Tovena

Cyril Tovena 9 Aug 2021 4 min read


Writing LogQL queries to access Loki’s log data just got easier, thanks to the new pattern parser released with Loki 2.3. It makes writing queries for unstructured log formats simple. And the pattern parser can be an order of magnitude faster than the regular expression parser. Let’s take a closer look.

Log-parsing woes

Loki 2.0 introduced new LogQL parsers that handle JSON, logfmt, and regex. While the JSON and logfmt parsers are fast and easy to use, the regex parser is neither.

Consider the parsing of NGINX logs to extract labels and values. To find the rate of requests by method and status, the query is scary and cumbersome. The regex is highlighted within this example query:

We can make it simple and intuitive to parse common log formats. After all, these unstructured log entries are tokens separated by literals and spaces.

Introducing the pattern parser

Loki v2.3.0 introduces the pattern parser. It is both simple to use and super efficient at extracting data from unstructured logs.

Spoiler alert! Here’s that same query, written using the pattern parser:

There is quite a big difference between this pattern expression and the regular expression. I know this from my own experience generating the regex for this example. It took me several iterations and 20 minutes to get it right. On top of that, the pattern parser parses log lines faster than a regex parser.

Pattern parser syntax and semantics

Invoke the pattern parser within a LogQL query by specifying:

| pattern "<pattern-expression>"

or 

| pattern `<pattern-expression>`

<pattern-expression> specifies the structure of a log line. It is composed of captures and literals.

A capture defines a field name and is delimited by the < and > characters. In the example, <status> defines the field name status. The unnamed capture <_> skips and ignores matched content within the log line.

Captures are matched from the beginning of the line, or from the previous set of literals to the end of the line, or to the next set of literals. If a capture does not match, the pattern parser stops processing the log line. By default, pattern expressions are anchored at the beginning of the log line. If you want to change this behavior, start your expression with an unnamed capture, <_>.

Pattern parser examples

The example pattern parser expression operates on an NGINX log line. Here are three sample NGINX log lines:

192.0.2.0 - - [04/Aug/2021:21:12:04 +0000] "GET /api/plugins/versioncheck?slugIn=&grafanaVersion=6.3.5 HTTP/1.1" 200 2 "-" "Go-http-client/2.0" "220.248.51.226, 34.120.177.193" "TLSv1.2" "CN" "CN31"
198.51.100.0 - - [04/Aug/2021:21:12:04 +0000] "GET /ws/?EIO=3&transport=polling&t=NiJ0b8H HTTP/1.1" 200 103 "https://grafana.com/grafana/download?platform=mac" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15" "2001:240:168:3400::1:87, 2600:1901:0:b3ea::" "TLSv1.3" "JP" "JP13"
203.0.113.0 - - [04/Aug/2021:21:12:04 +0000] "GET /healthz HTTP/1.1" 200 15 "-" "GoogleHC/1.0" "-" "-" "-" "-"

Here is the example <pattern-expression>

<_> - - <_> "<method> <_> <_>" <status> <_> <_> "<_>" <_>

This table matches the fields with the portions of the pattern expression for the third sample NGINX log line. Note that the last <_> field in the <pattern-expression> consumes the ending four fields in the log line, as it stops consuming when it reaches the end of the log line.

NGINX log line fields NGINX sample <pattern expression>
$remote_addr 203.0.113.0 <_>
- - -
$remote_user - -
[$time_local] [04/Aug/2021:21:12:04 +0000] <_>
“$request” “GET /healthz HTTP/1.1” “<method> <_> <_>”
$status 200 <status>
$bytes_sent 15 <_>
“$http_referer” “-” <_>
“$http_user_agent” “GoogleHC/1.0” “<_>”
“-” “-” “-” “-” <_>

This example log line defines method=“GET” and status=200.

A second example uses the pattern parser for an Envoy proxy in a Kubernetes environment. It is a metric query that returns the 99th percentile latency per path and method, given in seconds.

quantile_over_time(0.99,

  {container="envoy"} 

 | pattern `[<_>] "<method> <path> <_>" <_> <_> <_> <_> <latency>`

 | unwrap latency\[$__interval]

  ) by (method,path) / 1e3

For matching log lines, this example defines method, path, and latency.

Learn more

To showcase how readable and simple pattern expressions are, we created some query examples in our GitHub repository. You can read more about the pattern parser in our documentation.

We hope you make use of this fast and easy log parsing. Try the pattern parser yourself for free on Grafana Cloud, the easiest way to get started with metrics, logs, traces, and dashboards. Sign up for a free account here.