<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Design documents on Grafana Labs</title><link>https://grafana.com/docs/loki/v3.7.x/community/design-documents/</link><description>Recent content in Design documents on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/loki/v3.7.x/community/design-documents/index.xml" rel="self" type="application/rss+xml"/><item><title>Labels</title><link>https://grafana.com/docs/loki/v3.7.x/community/design-documents/labels/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/community/design-documents/labels/</guid><content><![CDATA[&lt;h1 id=&#34;labels&#34;&gt;Labels&lt;/h1&gt;
&lt;p&gt;Author: Ed Welch
Date: February 2019&lt;/p&gt;
&lt;p&gt;This is the official version of this doc as of 2019/04/03, the original discussion was had via a &lt;a href=&#34;https://docs.google.com/document/d/16y_XFux4h2oQkJdfQgMjqu3PUxMBAq71FoKC_SkHzvk/edit?usp=sharing&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Google doc&lt;/a&gt;, which is being kept for posterity but will not be updated moving forward.&lt;/p&gt;
&lt;h2 id=&#34;problem-statement&#34;&gt;Problem Statement&lt;/h2&gt;
&lt;p&gt;We should be able to filter logs by labels extracted from log content.&lt;/p&gt;
&lt;p&gt;Keeping in mind:
Loki is not a log search tool and we need to discourage the use of log labels as an attempt to recreate log search functionality.  Having a label on “order number” would be bad, however, having a label on “orderType=plant” and then filtering the results on a time window with an order number would be fine.  (think: grep “plant” | grep “12324134” )
Loki as a grep replacement, log tailing or log scrolling tool is highly desirable, log labels will be useful in reducing query results and improving query performance, combined with logQL to narrow down results.&lt;/p&gt;
&lt;h2 id=&#34;use-cases&#34;&gt;Use Cases&lt;/h2&gt;
&lt;p&gt;As defined for prometheus “Use labels to differentiate the characteristics of the thing that is being measured” There are common cases where someone would want to search for all logs which had a level of “Error” or for a certain HTTP path (possibly too high cardinality), or of a certain order or event type.
Examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Log levels.&lt;/li&gt;
&lt;li&gt;HTTP Status codes.&lt;/li&gt;
&lt;li&gt;Event type.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;challenges&#34;&gt;Challenges&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Logs are often unstructured data, it can be very difficult to extract reliable data from some unstructured formats, often requiring the use of complicated regular expressions.&lt;/li&gt;
&lt;li&gt;Easy to abuse.  Easy to create a Label with high cardinality, even possibly by accident with a rogue regular expression.&lt;/li&gt;
&lt;li&gt;Where do we extract metrics and labels at the client (Promtail or other?) or Loki? Extraction at the server (Loki) side has some pros/cons.  Can we do both? At least with labels we could define a set of expected labels and if Loki doesn’t receive them they could be extracted.
&lt;ul&gt;
&lt;li&gt;Server side extraction would improve interoperability at the expense of increase server workload and cost.&lt;/li&gt;
&lt;li&gt;Are there discoverability questions/concerns with metrics exposed via Loki vs the agent? Maybe this is better/easier to manage?&lt;/li&gt;
&lt;li&gt;Potentially more difficult to manage configuration with the server side having to match configs to incoming log streams&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;existing-solutions&#34;&gt;Existing Solutions&lt;/h2&gt;
&lt;p&gt;There already exist solutions for extracting processing and extracting metrics from unstructured log data, however, they will not quite work for extracting labels without some work and neither support easy inclusion as a library.  It’s worth noting and understanding how they work to try to get the best features in our solution.
mtail
&lt;a href=&#34;https://github.com/google/mtail&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;https://github.com/google/mtail&lt;/a&gt;
1721 github stars, huge number of commits, releases and contributors, google project&lt;/p&gt;
&lt;p&gt;All go, uses go RE2 regular expressions which is going to be more performant than grok_exporter below which uses a full regex implementation allowing backtracking and lookahead required to be compliant with Grok but which are also slower.&lt;/p&gt;
&lt;p&gt;grok_exporter
&lt;a href=&#34;https://github.com/fstab/grok_exporter&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;https://github.com/fstab/grok_exporter&lt;/a&gt;
278 github stars, mature/active project&lt;/p&gt;
&lt;p&gt;If you are familiar with Grok this would be more comfortable, many people use ELK stacks and would likely be familiar with or already have Grok strings for their logs, making it easy to use grok_exporter to extract metrics.&lt;/p&gt;
&lt;p&gt;One caveat is the dependency on the oniguruma C library which parses the regular expressions.&lt;/p&gt;
&lt;h2 id=&#34;implementation&#34;&gt;Implementation&lt;/h2&gt;
&lt;h3 id=&#34;details&#34;&gt;Details&lt;/h3&gt;
&lt;p&gt;As mentioned previously in the challenges for working with unstructured data, there isn’t a good one size fits all solution for extracting structured data.&lt;/p&gt;
&lt;p&gt;The Docker log format is an example where multiple levels of processing may be required, where the docker log is json, however, it also contains the log message field which itself could be embedded json, or a log message which needs regex parsing.&lt;/p&gt;
&lt;p&gt;A pipelined approach should allow for handling these more challenging scenarios&lt;/p&gt;
&lt;p&gt;There are 2 interfaces within Promtail already that should support constructing a pipeline:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Go&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-go&#34;&gt;type EntryMiddleware interface {
    Wrap(next EntryHandler) EntryHandler
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Go&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-go&#34;&gt;type EntryHandler interface {
    Handle(labels model.LabelSet, time time.Time, entry string) error
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Essentially every entry in the pipeline will Wrap the log line with another EntryHandler which can add to the LabelSet, set the timestamp, and mutate(or not) the log line) before it gets handed to the next stage in the pipeline.&lt;/p&gt;
&lt;h3 id=&#34;example&#34;&gt;Example&lt;/h3&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;JSON&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-json&#34;&gt;{
  &amp;#34;log&amp;#34;: &amp;#34;level=info msg=\”some log message\”\n&amp;#34;,
  &amp;#34;stream&amp;#34;: &amp;#34;stderr&amp;#34;,
  &amp;#34;time&amp;#34;: &amp;#34;2012-11-01T22:08:41&amp;#43;00:00&amp;#34;
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This is a docker format log file which is JSON but also contains a log message which has some key-value pairs.&lt;/p&gt;
&lt;p&gt;Our pipelined config might look like this:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;scrape_configs:
- job_name: system
  pipeline_stages:
    - json:
        timestamp:
          source: time
          format: RFC3339
        labels:
          stream:
            source: json_key_name.json_sub_key_name
        output: log
    - regex:
        expr: &amp;#39;.*level=(?P&amp;lt;level&amp;gt;[a-zA-Z]&amp;#43;).*&amp;#39;
        labels:
          level:
    - regex:
        expr: &amp;#39;.*msg=(?P&amp;lt;message&amp;gt;[a-zA-Z]&amp;#43;).*&amp;#39;
        output: message&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Looking at this a little closer:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;     - json:
        timestamp:
          source: time
          format: TODO                               ①
        labels:
          stream:
            source: json_key_name.json_sub_key_name  ②
        output: log                                  ③&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;① The format key will likely be a format string for Go’s time.Parse or a format string for strptime, this still needs to be decided, but the idea would be to specify a format string used to extract the timestamp data, for the regex parser there would also need to be a expr key used to extract the timestamp.
② One of the json elements was “stream” so we extract that as a label, if the json value matches the desired label name it should only be required to specify the label name as a key, if some mapping is required you can optionally provide a “source” key to specify where to find the label in the document. (Note the use of &lt;code&gt;json_key_name.json_sub_key_name&lt;/code&gt; is just an example here and doesn&amp;rsquo;t match our example log)
③ Tell the pipeline which element from the json to send to the next stage.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;    - regex:
        expr: &amp;#39;.*level=(?P&amp;lt;level&amp;gt;[a-zA-Z]&amp;#43;).*&amp;#39;  ①
        labels:
          level:                                ②&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;① Define the Go RE2 regex, making sure to use a named capture group.
② Extract labels using the named capture group names.&lt;/p&gt;
&lt;p&gt;Notice there was not an output section defined here, omitting the output key should instruct the parser to return the incoming log message to the next stage with no changes.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;    - regex:
        expr: &amp;#39;.*msg=(?P&amp;lt;message&amp;gt;[a-zA-Z]&amp;#43;).*&amp;#39;
        output: message                          ①&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;① Send the log message as the output to the last stage in the pipeline, this will be what you want Loki to store as the log message.&lt;/p&gt;
&lt;p&gt;There is an alternative configuration that could be used here to accomplish the same result:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;scrape_configs:
- job_name: system
  pipeline_stages:
    - json:
        timestamp:
          source: time
          format: FIXME
        labels:
          stream:
        output: log
    - regex:
        expr: &amp;#39;.*level=(?P&amp;lt;level&amp;gt;[a-zA-Z]&amp;#43;).*msg=(?P&amp;lt;message&amp;gt;[a-zA-Z]&amp;#43;).*&amp;#39;
        labels:
          level:                                                             ①
          log:
            source: message                                                  ②
        output: message&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;① Similar to the json parser, if your log label matches the regex named group, you need only specify the label name as a yaml key
② If you had a use case for specifying a different label name from the regex group name you can optionally provide the &lt;code&gt;source&lt;/code&gt; key with the value matching the named capture group.&lt;/p&gt;
&lt;p&gt;You can define a more complicated regular expression with multiple capture groups to extract many labels and/or the output log message in one entry parser.  This has the advantage of being more performant, however, the regular expression will also get much more complicated.&lt;/p&gt;
&lt;p&gt;Note the regex for &lt;code&gt;message&lt;/code&gt; is incomplete and would do a terrible job of matching any standard log message which might contain spaces or non alpha characters.&lt;/p&gt;
&lt;h3 id=&#34;concerns&#34;&gt;Concerns&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Debugging, especially if a pipeline stage is mutating the log entry.&lt;/li&gt;
&lt;li&gt;Clashing labels and how to handle this (two stages try to set the same label)&lt;/li&gt;
&lt;li&gt;Performance vs ease of writing/use, if every label is extracted one at a time and there are a lot of labels and a long line, it would force reading the line many times, however contrast this to a really long complicated regex which only has to read the line once but is difficult to write and/or change and maintain&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;further-improvements&#34;&gt;Further improvements&lt;/h3&gt;
&lt;p&gt;There are some basic building blocks for our pipeline which will use the EntryMiddleware interface, the two most commonly used will likely be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Regex Parser&lt;/li&gt;
&lt;li&gt;JSON Parser&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However we don’t want to ask people to copy and paste basic configs over and over for very common use cases, so it would make sense to add some additional parsers which would really be supersets of the base parsers above.&lt;/p&gt;
&lt;p&gt;For example, the config above might be simplified to:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;scrape_configs:
- job_name: system
  pipeline_stages:
    - docker:&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;or&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;scrape_configs:
- job_name: system
  pipeline_stages:
    - cri:&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Which could still easily be extended to extract additional labels:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;scrape_configs:
- job_name: system
  pipeline_stages:
    - docker:
    - regex:
        expr: &amp;#39;.*level=(?P&amp;lt;level&amp;gt;[a-zA-Z]&amp;#43;).*&amp;#39;
        labels:
          level:&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;auto-detection&#34;&gt;Auto Detection?&lt;/h3&gt;
&lt;p&gt;An even further simplification would be to attempt to autodetect the log format, a PR for this work has been submitted, then the config could be as simple as:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;scrape_configs:
- job_name: system
  pipeline_stages:
    - auto:&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This certainly has some advantages for people first adopting and testing Loki, allowing them to point it at their logs and at least get the timestamp and log message extracted properly for the common formats like Docker and CRI.&lt;/p&gt;
&lt;p&gt;There are also some challenges with auto detection and edge cases, though most people are going to want to augment the basic config with additional labels, so maybe it makes sense to default to auto but suggest when people start writing configs they chose the correct parser?&lt;/p&gt;
&lt;h2 id=&#34;other-thoughts-and-considerations&#34;&gt;Other Thoughts and Considerations&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;We should have a standalone client in some fashion which allows for testing of log parsing at the command line, allowing users to validate regular expressions or configurations to see what information is extracted.&lt;/li&gt;
&lt;li&gt;Other input formats to the pipeline which are not reading from log files, such as containerd grpc api, or from stdin or unix pipes, etc.&lt;/li&gt;
&lt;li&gt;It would be nice if at some point we could support loading code into the pipeline stages for even more advanced/powerful parsing capabilities.&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="labels">Labels&lt;/h1>
&lt;p>Author: Ed Welch
Date: February 2019&lt;/p>
&lt;p>This is the official version of this doc as of 2019/04/03, the original discussion was had via a &lt;a href="https://docs.google.com/document/d/16y_XFux4h2oQkJdfQgMjqu3PUxMBAq71FoKC_SkHzvk/edit?usp=sharing" target="_blank" rel="noopener noreferrer">Google doc&lt;/a>, which is being kept for posterity but will not be updated moving forward.&lt;/p></description></item><item><title>Promtail Push API</title><link>https://grafana.com/docs/loki/v3.7.x/community/design-documents/2020-02-promtail-push-api/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/community/design-documents/2020-02-promtail-push-api/</guid><content><![CDATA[&lt;h1 id=&#34;promtail-push-api&#34;&gt;Promtail Push API&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Author: Robert Fratto (@rfratto)&lt;/li&gt;
&lt;li&gt;Date: Feb 4 2020&lt;/li&gt;
&lt;li&gt;Status: DRAFT&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Despite being an optional piece of software, Promtail provides half the power
of Loki&amp;rsquo;s story: log transformations, service discovery, metrics from logs,
and context switching between your existing metrics and logs. Today, Promtail
can only be operated to consume logs from very specific sources: files, journal,
or syslog. If users wanted to write custom tooling to ship logs, the tooling
has to bypass Promtail and push directly to Loki. This can lead users to
reimplement functionality Promtail already provides, including its error retries
and batching code.&lt;/p&gt;
&lt;p&gt;This document proposes a Push API for Promtail. The preferred implementation is
by copying the existing Loki Push API and implementing it for Promtail. By
being compatible with the Loki Push API, the Promtail Push API can allow batches
of logs to be processed at once for optimize performance. Matching the
Promtail API also allows users to transparently switch their push URLs from
their existing tooling. Finally, a series of alternative solutions are detailed.&lt;/p&gt;
&lt;h2 id=&#34;configuration&#34;&gt;Configuration&lt;/h2&gt;
&lt;p&gt;Promtail will have a new target called HTTPTarget, configurable in the
&lt;code&gt;scrape_config&lt;/code&gt; array with the following schema:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;# Defines an HTTP target, which exposes an endpoint against the Promtail
# HTTP server to accept log traffic.
http:
  # Defines the base URL for the push path, adding a prefix to the
  # exposed endpoint. The final endpoint path is
  # &amp;lt;base_url&amp;gt;loki/api/v1/push. If omitted, defaults to /.
  #
  # Multiple http targets with the same base_url must not exist.
  base_url: /

  # Map of labels to add to every log line passed through to the target.
  labels: {}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;considerations&#34;&gt;Considerations&lt;/h3&gt;
&lt;p&gt;Users will be able to define multiple &lt;code&gt;http&lt;/code&gt; scrape configs, but the base URL
value must be different for each instance. This allows to cleanly separate
pipelines through different push endpoints.&lt;/p&gt;
&lt;p&gt;Users must also be aware about problems with running Promtail with an HTTP
target behind a load balancer: if payloads are load balanced between multiple
Promtail instances, ordering of logs in Loki will be disrupted leading to
rejected pushes. Users are recommended to do one of the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Have a dedicated Promtail instance for receiving pushes. This also applies to
using the syslog target.&lt;/li&gt;
&lt;li&gt;Have a separate Kubernetes service that always resolves to the same Promtail pod,
bypassing the load balancing issue.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;implementation&#34;&gt;Implementation&lt;/h2&gt;
&lt;p&gt;As discussed in this document, this feature will be implemented by copying the
existing 
    &lt;a href=&#34;/docs/loki/v3.7.x/api/#post-lokiapiv1push&#34;&gt;Loki Push API&lt;/a&gt;
and exposing it via Promtail.&lt;/p&gt;
&lt;h2 id=&#34;considered-alternatives&#34;&gt;Considered Alternatives&lt;/h2&gt;
&lt;p&gt;Using the existing API was chosen for its simplicity and capabilities of being
used for interesting configurations (e.g., chaining Promtails together). These
other options were considered but rejected as not the best solution for the
problem being solved.&lt;/p&gt;
&lt;p&gt;Note that Option 3 has value and may be implemented separately from this
feature.&lt;/p&gt;
&lt;h3 id=&#34;option-1-json--protobuf-payload&#34;&gt;Option 1: JSON / Protobuf Payload&lt;/h3&gt;
&lt;p&gt;A new JSON and Protobuf payload format can be designed instead of the existing
Loki push payload. Both formats would have to be exposed to support clients that
either can&amp;rsquo;t or won&amp;rsquo;t use protobuf marshalling.&lt;/p&gt;
&lt;p&gt;The primary benefit of this approach is to allow us to tweak the payload schema
independently of Loki&amp;rsquo;s existing schema, but otherwise may not be very useful
and is essentially just code duplication.&lt;/p&gt;
&lt;h3 id=&#34;option-2-grpc-service&#34;&gt;Option 2: gRPC Service&lt;/h3&gt;
&lt;p&gt;The
&lt;a href=&#34;https://github.com/grafana/loki/blob/f7ee1c753c76ef63338d53cfba782188a165144d/pkg/logproto/logproto.proto#L8-L10&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;logproto.Pusher&lt;/a&gt;
service could be exposed through Promtail. This would enable clients stubs to be
generated for languages that have gRPC support, and, for HTTP1 support, a
&lt;a href=&#34;https://github.com/grpc-ecosystem/grpc-gateway&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;gRPC gateway&lt;/a&gt; would be embedded
in Promtail itself.&lt;/p&gt;
&lt;p&gt;This implementation option is similar to the original proposed solution, but
uses the gRPC gateway to handle HTTP/1 traffic instead of the HTTP1 shim that
Loki uses. There are some concerns with this approach:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The gRPC Gateway reverse proxy will need to play nice with the existing HTTP
mux used in Promtail.&lt;/li&gt;
&lt;li&gt;We couldn&amp;rsquo;t control the HTTP and Protobuf formats separately as Loki can.&lt;/li&gt;
&lt;li&gt;Log lines will be double-encoded thanks to the reverse proxy.&lt;/li&gt;
&lt;li&gt;A small overhead of using a reverse proxy in-process will be introduced.&lt;/li&gt;
&lt;li&gt;This breaks our normal pattern of writing our own shim functions; may add
some cognitive overhead of having to deal with the gRPC gateway as an outlier
in the code.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;option-3-plaintext-payload&#34;&gt;Option 3: Plaintext Payload&lt;/h3&gt;
&lt;p&gt;Prometheus&amp;rsquo; &lt;a href=&#34;https://github.com/prometheus/pushgateway#command-line&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Push Gateway API&lt;/a&gt;
is cleverly designed and we should consider implementing our API in the same
format: users would push to &lt;code&gt;http://promtail-url/push/label1/value1?timestamp=now&lt;/code&gt;
with a plaintext POST body. For example:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;curl -X POST http://promtail.default/push/foo/bar/fizz/buzz -d “hello, world!”&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This approach may be slightly faster when compared to a non-plaintext payload as
no unmarshaling needs to be performed. This URL path and timestamp still needs
to be parsed, but this will generally be faster than the reflection requirements
imposed by JSON.&lt;/p&gt;
&lt;p&gt;However, note that this API limits Promtail to accepting one line at a time and
may cause performance issues when trying to handle large volumes of traffic. As
an alternative, this API could also be implemented by external tooling and be
built on top of any of the other implementation options.&lt;/p&gt;
&lt;p&gt;An &lt;a href=&#34;https://github.com/grafana/loki/pull/1270&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;example implementation&lt;/a&gt; was
created and has received positive support for its simplicity and ease of
integration.&lt;/p&gt;
]]></content><description>&lt;h1 id="promtail-push-api">Promtail Push API&lt;/h1>
&lt;ul>
&lt;li>Author: Robert Fratto (@rfratto)&lt;/li>
&lt;li>Date: Feb 4 2020&lt;/li>
&lt;li>Status: DRAFT&lt;/li>
&lt;/ul>
&lt;p>Despite being an optional piece of software, Promtail provides half the power
of Loki&amp;rsquo;s story: log transformations, service discovery, metrics from logs,
and context switching between your existing metrics and logs. Today, Promtail
can only be operated to consume logs from very specific sources: files, journal,
or syslog. If users wanted to write custom tooling to ship logs, the tooling
has to bypass Promtail and push directly to Loki. This can lead users to
reimplement functionality Promtail already provides, including its error retries
and batching code.&lt;/p></description></item><item><title>Write-Ahead Logs</title><link>https://grafana.com/docs/loki/v3.7.x/community/design-documents/2020-09-write-ahead-log/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/community/design-documents/2020-09-write-ahead-log/</guid><content><![CDATA[&lt;h2 id=&#34;write-ahead-logs&#34;&gt;Write-Ahead Logs&lt;/h2&gt;
&lt;p&gt;Author: Owen Diehl - &lt;a href=&#34;https://github.com/owen-d&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;owen-d&lt;/a&gt; (&lt;a href=&#34;/&#34;&gt;Grafana Labs&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Date: 30/09/2020&lt;/p&gt;
&lt;h2 id=&#34;impetus&#34;&gt;Impetus&lt;/h2&gt;
&lt;p&gt;Loki already takes numerous steps to ensure the persistence of log data, most notably the use of a configurable replication factor (redundancy) in the ingesters. However, this still leaves much to be desired in persistence guarantees, especially for single binary deployments. This proposal outlines a write ahead log (WAL) in order to complement existing measures by allowing storage/replay of incoming writes via local disk on the ingester components.&lt;/p&gt;
&lt;h2 id=&#34;strategy&#34;&gt;Strategy&lt;/h2&gt;
&lt;p&gt;We suggest a two pass WAL implementation which includes an initial recording of accepted writes (&lt;code&gt;segments&lt;/code&gt;) and a subsequent checkpointing (&lt;code&gt;checkpoints&lt;/code&gt;) which coalesces the first pass into more efficient representations to speed up replaying.&lt;/p&gt;
&lt;h3 id=&#34;segments&#34;&gt;Segments&lt;/h3&gt;
&lt;p&gt;Segments are the first pass and most basic WAL. They store individual records of incoming writes that have been accepted and can be used to reconstruct the in memory state of an ingester without any external input. Each segment is some multiple of 32kB and upon filling one segment, a new segment is created. Initially Loki will try 256kb segment sizes, readjusting as necessary. They are sequentially named on disk and are automatically created when a target size is hit as follows:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;data
└── wal
   ├── 000000
   ├── 000001
   └── 000002&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;truncation&#34;&gt;Truncation&lt;/h3&gt;
&lt;p&gt;In order to prevent unbounded growth and remove operations which have been flushed to storage from the WAL, it is regularly truncated and all but the last segment (which is currently active) are deleted at a configurable interval (&lt;code&gt;ingester.checkpoint-duration&lt;/code&gt;). This is where checkpoints come into the picture.&lt;/p&gt;
&lt;h3 id=&#34;checkpoints&#34;&gt;Checkpoints&lt;/h3&gt;
&lt;p&gt;Before truncating the WAL, we advance the WAL segments by one in order to ensure we don&amp;rsquo;t delete the currently writing segment. The directory will look like:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;data
└── wal
   ├── 000000
   ├── 000001
   ├── 000002 &amp;lt;- likely not full, no matter
   └── 000003 &amp;lt;- newly written, empty&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Each in memory stream is iterated across an interval, calculated by &lt;code&gt;checkpoint_duration / in_memory_streams&lt;/code&gt; and written to the checkpoint. After the checkpoint completes, it is moved from its temp directory to the &lt;code&gt;ingester.wal-dir&lt;/code&gt;, taking the name of the last segment before it started (&lt;code&gt;checkpoint.000002&lt;/code&gt;) and then all applicable segments (&lt;code&gt;00000&lt;/code&gt;, &lt;code&gt;00001&lt;/code&gt;, &lt;code&gt;00002&lt;/code&gt;) and any previous checkpoint are deleted.&lt;/p&gt;
&lt;p&gt;Afterwards, it will look like:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;data
└── wal
   ├── checkpoint.000002 &amp;lt;- completed checkpoint
   └── 000003 &amp;lt;- currently active wal segment&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h4 id=&#34;queueing-checkpoint-operations&#34;&gt;Queueing Checkpoint operations&lt;/h4&gt;
&lt;p&gt;It’s possible that one checkpoint operation will start at the same time another is running. In this case, the existing checkpoint operation should disregard its internal ticker and flush its series as fast as possible. Afterwards, the next checkpoint operation can begin. This will likely create a localized spike in IOPS before the amortization of the following checkpoint operation takes over and is another important reason to run the WAL on an isolated disk in order to mitigate noisy neighbor problems. After we&amp;rsquo;ve written/moved the current checkpoint, we reap the old one.&lt;/p&gt;
&lt;h3 id=&#34;wal-record-types&#34;&gt;WAL Record Types&lt;/h3&gt;
&lt;h4 id=&#34;streams&#34;&gt;Streams&lt;/h4&gt;
&lt;p&gt;A &lt;code&gt;Stream&lt;/code&gt; record type is written when an ingester receives a push for a series it doesn&amp;rsquo;t yet have in memory. At a high level, this will contain&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;golang&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-golang&#34;&gt;type SeriesRecord struct {
	UserID  string
	Labels labels.Labels
	Fingerprint uint64 // label fingerprint
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h4 id=&#34;logs&#34;&gt;Logs&lt;/h4&gt;
&lt;p&gt;A &lt;code&gt;Logs&lt;/code&gt; record type is written when an ingester receives a push, containing the fingerprint of the series it refers to and a list of &lt;code&gt;(timestamp, log_line)&lt;/code&gt; tuples, &lt;em&gt;after&lt;/em&gt; a &lt;code&gt;Stream&lt;/code&gt; record type is written, if applicable.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;golang&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-golang&#34;&gt;type LogsRecord struct {
	UserID  string
	Fingperprint uint64 // label fingerprint for the series these logs refer to
	Entries []logproto.Entry
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;restoration&#34;&gt;Restoration&lt;/h3&gt;
&lt;p&gt;Replaying a WAL is done by loading any available checkpoints into memory and then replaying any operations from successively named segments on top (&lt;code&gt;checkpoint.000003&lt;/code&gt; -&amp;gt; &lt;code&gt;000004&lt;/code&gt; -&amp;gt; &lt;code&gt;000005&lt;/code&gt;, etc). It&amp;rsquo;s likely some of these operations will fail because they&amp;rsquo;re already included in the checkpoint (due to delay introduced in our amortizations), but this is ok &amp;ndash; we won&amp;rsquo;t &lt;em&gt;lose&lt;/em&gt; any data, only try to write some data twice, which will be ignored.&lt;/p&gt;
&lt;h3 id=&#34;deployment&#34;&gt;Deployment&lt;/h3&gt;
&lt;p&gt;Introduction of the WAL requires that ingesters have persistent disks which are reconnected across restarts (this is a good fit for StatefulSets in Kubernetes). Additionally, it&amp;rsquo;s recommended that the WAL uses an independent disk such that it&amp;rsquo;s isolated from being affected by or causing noisy neighbor problems, especially during any IOPS spike(s).&lt;/p&gt;
&lt;h3 id=&#34;implementation-goals&#34;&gt;Implementation goals&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Use underlying prometheus wal pkg when possible for consistency and to mitigate undifferentiated heavy lifting. Interfaces handle page alignment and use []byte.
&lt;ul&gt;
&lt;li&gt;Ensure this package handles arbitrarily long records (log lines in Loki’s case).&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Ensure our in memory representations can be efficiently moved to/from &lt;code&gt;[]byte&lt;/code&gt; in order to generate conversions for fast/efficient loading from checkpoints.&lt;/li&gt;
&lt;li&gt;Ensure chunks which have already been flushed to storage are kept around for &lt;code&gt;ingester.retain-period&lt;/code&gt;, even after a WAL replay.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;alternatives&#34;&gt;Alternatives&lt;/h3&gt;
&lt;h4 id=&#34;use-the-cortex-wal&#34;&gt;Use the Cortex WAL&lt;/h4&gt;
&lt;p&gt;Since we&amp;rsquo;re not checkpointing from the WAL records but instead doing a memory dump, this isn&amp;rsquo;t bottlenecked by throughput but rather memory size. Therefore we can start with checkpointing by duration rather than accounting for throughput as well. This makes the proposed solution nearly identical to the Cortex WAL approach. The one caveat is that wal segments will accrue between checkpoint operations and may constitute a large amount of data (log throughput varies). We may eventually consider other routes to handle this if duration based checkpointing proves insufficient.&lt;/p&gt;
&lt;h4 id=&#34;dont-build-checkpoints-from-memory-instead-write-new-wal-elements&#34;&gt;Don&amp;rsquo;t build checkpoints from memory, instead write new WAL elements&lt;/h4&gt;
&lt;p&gt;Instead of building checkpoints from memory, this would build the same efficiencies into two distinct WAL Record types: &lt;code&gt;Blocks&lt;/code&gt; and &lt;code&gt;FlushedChunks&lt;/code&gt;. The former is a record type which will contain an entire compressed block after it&amp;rsquo;s cut and the latter will contain an entire chunk &#43; the sequence of blocks it holds when it&amp;rsquo;s flushed. This may offer good enough amortization of writes because block cuts are assumed to be evenly distributed and chunk flushes have the same property and use jitter for synchronization.&lt;/p&gt;
&lt;p&gt;This could be used to drop WAL records which have already elapsed the &lt;code&gt;ingester.retain-period&lt;/code&gt;, allowing for faster WAL replays and more efficient loading.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;golang&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-golang&#34;&gt;type FlushRecord struct {
  Fingerprint uint64 // labels
  FlushedAt uint64 // timestamp when it was flushed, can be used with `ingester.retain-period` to either keep or discard records on replay
  LastEntry logproto.Entry // last entry included in the flushed chunk
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It would also allow building checkpoints without relying on an ingester&amp;rsquo;s internal state, but would likely require multiple WALs, partitioned by record type in order to be able to iterate all &lt;code&gt;FlushedChunks&lt;/code&gt; -&amp;gt; &lt;code&gt;Blocks&lt;/code&gt; -&amp;gt; &lt;code&gt;Series&lt;/code&gt; -&amp;gt; &lt;code&gt;Samples&lt;/code&gt; such that we could no-op the later (lesser priority) types that are superseded by the former types. The benefits do not seem worth the cost here, especially considering the simpler suggested alternative and the extensibility costs if we need to add new record types if/when the ingester changes internally.&lt;/p&gt;
]]></content><description>&lt;h2 id="write-ahead-logs">Write-Ahead Logs&lt;/h2>
&lt;p>Author: Owen Diehl - &lt;a href="https://github.com/owen-d" target="_blank" rel="noopener noreferrer">owen-d&lt;/a> (&lt;a href="/">Grafana Labs&lt;/a>)&lt;/p>
&lt;p>Date: 30/09/2020&lt;/p>
&lt;h2 id="impetus">Impetus&lt;/h2>
&lt;p>Loki already takes numerous steps to ensure the persistence of log data, most notably the use of a configurable replication factor (redundancy) in the ingesters. However, this still leaves much to be desired in persistence guarantees, especially for single binary deployments. This proposal outlines a write ahead log (WAL) in order to complement existing measures by allowing storage/replay of incoming writes via local disk on the ingester components.&lt;/p></description></item><item><title>Ordering Constraint Removal</title><link>https://grafana.com/docs/loki/v3.7.x/community/design-documents/2021-01-ordering-constraint-removal/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/community/design-documents/2021-01-ordering-constraint-removal/</guid><content><![CDATA[&lt;h2 id=&#34;ordering-constraint-removal&#34;&gt;Ordering Constraint Removal&lt;/h2&gt;
&lt;p&gt;Author: Owen Diehl - &lt;a href=&#34;https://github.com/owen-d&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;owen-d&lt;/a&gt; (&lt;a href=&#34;/&#34;&gt;Grafana Labs&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Date: 28/01/2021&lt;/p&gt;
&lt;h2 id=&#34;problem&#34;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Loki imposes an ordering constraint on ingested data; that is to say incoming data must have monotonically increasing timestamps, partitioned by stream. This has historical inertia from our parent project, Cortex, but presents unintended consequences specific to log ingestion. In contrast to metric scraping, Loki has reasonable use cases where the ordering constraint poses a problem, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ingesting logs from a cloud function without feeling pressured to add high cardinality labels like invocation_id to avoid out of order errors.&lt;/li&gt;
&lt;li&gt;Ingesting logs from other agents/mechanisms that don’t take into account Loki’s ordering constraint. For instance, fluent{d,bit} variants may batch and retry writes independently of other batches, causing unpredictable log loss via out of order errors.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Many of these illustrate the adversity between &lt;em&gt;ordering&lt;/em&gt; and &lt;em&gt;cardinality&lt;/em&gt;. In addition to enabling some previously difficult/impossible use cases, removing the ordering constraint lets us avoid potential conflict between these two concepts and helps incentivize good practice in the form of fewer useful labels.&lt;/p&gt;
&lt;h3 id=&#34;requirements&#34;&gt;Requirements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Enable out of order writes&lt;/li&gt;
&lt;li&gt;Maintain query interface parity&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&#34;bonuses&#34;&gt;Bonuses&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Optimize for in order writes&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;alternatives&#34;&gt;Alternatives&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Implement order-agnostic blocks &#43; increase memory usage by compression_ratio: Deemed unacceptable due to TCO (total cost of ownership).&lt;/li&gt;
&lt;li&gt;Implement order-agnostic blocks &#43; scale horizontally (reduce per-ingester streams): Deemed unacceptable due to TCO and increasing ring pressure.&lt;/li&gt;
&lt;li&gt;Implement order-agnostic blocks &#43; flush chunks more frequently: Deemed unacceptable due to negatively increasing the index and the number of chunks requiring merge during reads.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;design&#34;&gt;Design&lt;/h2&gt;
&lt;h3 id=&#34;background&#34;&gt;Background&lt;/h3&gt;
&lt;p&gt;I suggest allowing a stream&amp;rsquo;s head block to accept unordered writes and later re-order cut blocks similar to merge-sort before flushing them to storage. Currently, writes are accepted in monotonically increasing timestamp order to a &lt;em&gt;headBlock&lt;/em&gt;, which is occasionally &amp;ldquo;cut&amp;rdquo; into a compressed, immutable &lt;em&gt;block&lt;/em&gt;. In turn, these &lt;em&gt;blocks&lt;/em&gt; are combined into a &lt;em&gt;chunk&lt;/em&gt; and persisted to storage.&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;Figure 1

    Data while being buffered in Ingester          |                                Chunk in storage
                                                   |
    Blocks                    Head                 |       ---------------------------------------------------------------------
                                                   |       |   ts0   ts1    ts2   ts3    ts4   ts5    ts6   ts7    ts8    ts9  |
--------------           ----------------          |       |   ---------    ---------    ---------    ---------    ---------   |
|    blocks  |--         |  head block  |          |       |   |block 0|    |block 1|    |block 2|    |block 4|    |block 5|   |
|(compressed)| |         |(uncompressed)|          |       |   |       |    |       |    |       |    |       |    |       |   |
|            | | ------&amp;gt; |              |          |       |   ---------    ---------    ---------    ---------    ---------   |
|            | |         |              |          |       |                                                                   |
-------------- |         ----------------          |       ---------------------------------------------------------------------
  |            |                                   |
  --------------                                   |&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Historically because of Loki&amp;rsquo;s ordering constraint, these blocks maintain a monotonically increasing timestamp (abbreviated &lt;code&gt;ts&lt;/code&gt;) order where&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;Figure 2

start       end
ts0         ts1          ts2        ts3
--------------           --------------
|            |           |            |
|            | --------&amp;gt; |            |
|            |           |            |
|            |           |            |
--------------           --------------&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This allows us two optimzations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;We can store much more data in memory because each block is compressed after being cut from a head block.&lt;/li&gt;
&lt;li&gt;We can query the block&amp;rsquo;s metadata, such as &lt;code&gt;ts0&lt;/code&gt; and &lt;code&gt;ts1&lt;/code&gt; and skip querying it in the case of i.e. the timestamps are outside a request&amp;rsquo;s bounds.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;unordered-head-blocks&#34;&gt;Unordered Head Blocks&lt;/h3&gt;
&lt;p&gt;The head block&amp;rsquo;s internal structure will be replaced with a tree structure, enabling logarithmic inserts/lookups and &lt;code&gt;n log(n)&lt;/code&gt; scans. &lt;em&gt;Cutting&lt;/em&gt; a block from the head block will iterate through this tree, creating a sorted block identical to the ones currently in use. However, because we&amp;rsquo;ll be accepting arbitrarily-ordered writes, there will no longer be any guaranteed inter-block order. In contrast to figure 2, blocks may have overlapping data:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;Figure 3

start       end
ts1         ts3          ts0        ts2
--------------           --------------
|            |           |            |
|            | --------&amp;gt; |            |
|            |           |            |
|            |           |            |
--------------           --------------&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Thus &lt;em&gt;all&lt;/em&gt; blocks must have their metadata checked against a query. In this example, a query for the bounds &lt;code&gt;[ts1,ts2]&lt;/code&gt;  would need to decompress and scan the &lt;code&gt;[ts1, ts2]&lt;/code&gt; range across both of them, but a query against &lt;code&gt;[ts3, ts4]&lt;/code&gt; would only decompress and scan &lt;em&gt;one&lt;/em&gt; block.&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;Figure 4

     chunk1
-------------------
                chunk2
         ---------------------
         query range requiring both
         ----------
                             query range requiring chunk2 only
                             -----------
ts0     ts1      ts2       ts3        ts4 (not in any block)
------------------------------
|        |        |          |
|        |        |          |
|        |        |          |
|        |        |          |
------------------------------&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The performance losses against the current approach includes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Appending a log line is now performed in logarithmic time instead of amortized (due to array resizing) constant time.&lt;/li&gt;
&lt;li&gt;Blocks may contain overlapping data (although ordering is still guaranteed within each block).&lt;/li&gt;
&lt;li&gt;Head block scans are now &lt;code&gt;O(n log(n))&lt;/code&gt; instead of &lt;code&gt;O(n)&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;flushing-and-chunk-creation&#34;&gt;Flushing and Chunk Creation&lt;/h3&gt;
&lt;p&gt;Loki regularly combines multiple blocks into a chunk and &amp;ldquo;flushes&amp;rdquo; it to storage. In order to ensure that reads over flushed chunks remain as performant as possible, we will re-order a possibly-overlapping set of blocks into a set of blocks that maintain monotonically increasing order between them. From the perspective of the rest of Loki’s components (queriers/rulers fetching chunks from storage), nothing has changed.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;&lt;strong&gt;In the case that data for a stream is ingested in order, this is effectively a no-op, making it well optimized for in-order writes (which is both the requirement and default in Loki currently). Thus, this should have little performance impact on ordered data while enabling Loki to ingest unordered data.&lt;/strong&gt;&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h4 id=&#34;chunk-durations&#34;&gt;Chunk Durations&lt;/h4&gt;
&lt;p&gt;When &lt;code&gt;--validation.reject-old-samples&lt;/code&gt; is enabled, Loki accepts incoming timestamps within the range&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;[now() - `--validation.reject-old-samples.max-age`, now() &amp;#43; `--validation.create-grace-period`]&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For most of our clusters, this would mean the range of acceptable data is one week long. In contrast, our max chunk age is &lt;code&gt;2h&lt;/code&gt;. Allowing unordered writes would mean that ingesters would willingly receive data for 168h, or up to 84 distinct chunk lengths. This presents a problem: a malicious user could be writing to many (84 in this case) distinct chunks simultaneously, flooding Loki with underutilized chunks which bloat the index.&lt;/p&gt;
&lt;p&gt;In order to mitigate this, there are a few options (not mutually exclusive):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Lower the valid acceptance range&lt;/li&gt;
&lt;li&gt;Create an &lt;em&gt;active&lt;/em&gt; validity window, such as &lt;code&gt;[most_recent_sample-max_chunk_age, now() &#43; creation_grace_period]&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first option is simple, already available, and likely somewhat reasonable.
The second is simple to implement and an effective way to ensure Loki can ingest unordered logs but maintain a sliding validity window. I expect this to cover nearly all reasonable use cases and effectively mitigate bad actors.&lt;/p&gt;
&lt;h4 id=&#34;chunk-synchronization&#34;&gt;Chunk Synchronization&lt;/h4&gt;
&lt;p&gt;We also cut chunks according to the &lt;code&gt;sync_period&lt;/code&gt;. The first timestamp ingested past this bound will trigger a cut. This process aids in increasing chunk determinism and therefore our deduplication ratio in object storage because chunks are &lt;a href=&#34;https://en.wikipedia.org/wiki/Content-addressable_storage&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;content addressed&lt;/a&gt;. With the removal of our ordering constraint, it&amp;rsquo;s possible that in some cases the synchronization method will not be as effective, such as during concurrent writes to the same stream across this bound.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;&lt;strong&gt;It&amp;rsquo;s important to mention that this is possible today with the current ordering constraint, but we&amp;rsquo;ll be increasing the likelihood by removing it.&lt;/strong&gt;&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;


&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;Figure 5

       Concurrent Writes over threshold
                   ^ ^
                   | |
                   | |
-----------------|-----------------
                 |
                 v
             Sync Marker&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To mitigate this problem and preserve the benefits of chunk deduplication, we&amp;rsquo;ll need to make chunk synchronization less susceptible to non-determinism during concurrent writes. To do this, we can move the synchronization trigger from the &lt;code&gt;Append&lt;/code&gt; code path to the asynchronous &lt;code&gt;FlushLoop&lt;/code&gt;. Note, the semantics for &lt;em&gt;when&lt;/em&gt; a chunk is cut will not change: that is, on the first timestamp crossing the synchronization bound. However, &lt;em&gt;cutting&lt;/em&gt; the chunks for synchronization on the flush path mitigates the likelihood of &lt;em&gt;different&lt;/em&gt; chunks being cut. In order to cut multiple chunks with different hashes, appends would then need to cross this boundary at the same time the flush loop checks the stream, which should be very unlikely.&lt;/p&gt;
&lt;h3 id=&#34;future-opportunities&#34;&gt;Future Opportunities&lt;/h3&gt;
&lt;p&gt;This ends the initial design portion of this document. Below, I&amp;rsquo;ll describe some possible changes we can address in the future, should they become warranted.&lt;/p&gt;
&lt;h4 id=&#34;variance-budget&#34;&gt;Variance Budget&lt;/h4&gt;
&lt;p&gt;The intended approach of a &amp;ldquo;sliding validity&amp;rdquo; window for each stream is simple and effective at preventing misuse and bad actors from writing across the entire acceptable range for incoming timestamps. However, we may in the future wish to take a more sophisticated approach, introducing per tenant &amp;ldquo;variance&amp;rdquo; budgets, likely derived from the stream limit. This ingester limit could, for example use an incremental (online) standard deviation/variance algorithm such as &lt;a href=&#34;https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Welford&amp;rsquo;s&lt;/a&gt;, which would allow writing to larger ranges than option (2) in the &lt;em&gt;Chunk Durations&lt;/em&gt; section.&lt;/p&gt;
&lt;h4 id=&#34;lsm-tree&#34;&gt;LSM Tree&lt;/h4&gt;
&lt;p&gt;Much of the proposed approach mirrors an &lt;a href=&#34;http://www.benstopford.com/2015/02/14/log-structured-merge-trees/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;LSM-Tree&lt;/a&gt; (Log Structured Merge Tree), albeit in memory instead of using disk. What a weird choice &amp;ndash; LSM Trees are designed to effectively use disk, so why not go that route? We currently have no wish to add extra disk dependencies to Loki where we can avoid it, but below I will outline what an LSM-Tree approach would look like. Ultimately, using disk would enable buffering more data in the ingester before flushing,&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Allowing us to&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flush more efficiently utilized chunks (in some cases)&lt;/li&gt;
&lt;li&gt;Keep open a wider validity window for incoming logs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;At the cost of&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;being susceptible to disk-related complexity and problems&lt;/li&gt;
&lt;/ul&gt;
&lt;h5 id=&#34;memtable-head-block&#34;&gt;MemTable (head block)&lt;/h5&gt;
&lt;p&gt;Writes in an LSM-Tree are first accepted to an in-memory structure called a &lt;em&gt;memtable&lt;/em&gt; (generally a balancing tree such as red-black) until the memtable hits a preconfigured size. In Loki, this corresponds to the stream’s head block, which is uncompressed.&lt;/p&gt;
&lt;h5 id=&#34;sstables-blocks&#34;&gt;SSTables (blocks)&lt;/h5&gt;
&lt;p&gt;Once a Memtable (head block) in an LSM-Tree hits a predefined size, it is flushed to disk as an immutable sorted structure called an SSTable (sorted strings table). In Loki, we can use either the pre-existing MemChunk format, which is ordered, compact, and contains a block index within it, or the pre-existing block format directly. These are stored on disk to lessen memory pressure and loaded for queries when necessary.&lt;/p&gt;
&lt;h5 id=&#34;block-index&#34;&gt;Block Index&lt;/h5&gt;
&lt;p&gt;Incoming reads in an LSM-Tree may need access to the SSTable entries in addition to the currently active memtable (head block). In order to improve this, we may cache the metadata including block offsets, start and end timestamps within an SSTable (block || MemChunk) in memory to mitigate lookups, seeking, and loading unnecessary data from disk.&lt;/p&gt;
&lt;h5 id=&#34;compaction-flushing&#34;&gt;Compaction (flushing)&lt;/h5&gt;
&lt;p&gt;Compaction in an LSM-Tree combines and reorders multiple SSTables (blocks || MemChunks). This is mainly covered in the &lt;em&gt;Flushing&lt;/em&gt; section of the in-memory approach, but &lt;em&gt;compaction&lt;/em&gt; is equivalent to &lt;em&gt;flushing&lt;/em&gt; for our case. That is, merge multiple SSTables on disk together in an algorithm reminiscent of merge sort and flush them to storage in our ordered chunk format.&lt;/p&gt;
]]></content><description>&lt;h2 id="ordering-constraint-removal">Ordering Constraint Removal&lt;/h2>
&lt;p>Author: Owen Diehl - &lt;a href="https://github.com/owen-d" target="_blank" rel="noopener noreferrer">owen-d&lt;/a> (&lt;a href="/">Grafana Labs&lt;/a>)&lt;/p>
&lt;p>Date: 28/01/2021&lt;/p>
&lt;h2 id="problem">Problem&lt;/h2>
&lt;p>Loki imposes an ordering constraint on ingested data; that is to say incoming data must have monotonically increasing timestamps, partitioned by stream. This has historical inertia from our parent project, Cortex, but presents unintended consequences specific to log ingestion. In contrast to metric scraping, Loki has reasonable use cases where the ordering constraint poses a problem, including:&lt;/p></description></item></channel></rss>