<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Manage Loki on Grafana Labs</title><link>https://grafana.com/docs/loki/v3.7.x/operations/</link><description>Recent content in Manage Loki on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/loki/v3.7.x/operations/index.xml" rel="self" type="application/rss+xml"/><item><title>Audit data propagation latency and correctness using Loki Canary</title><link>https://grafana.com/docs/loki/v3.7.x/operations/loki-canary/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/loki-canary/</guid><content><![CDATA[&lt;h1 id=&#34;audit-data-propagation-latency-and-correctness-using-loki-canary&#34;&gt;Audit data propagation latency and correctness using Loki Canary&lt;/h1&gt;
&lt;p&gt;Loki Canary is a standalone app that audits the log-capturing performance of a Grafana Loki cluster.&lt;br /&gt;
This component emits and periodically queries for logs, making sure that Loki is ingesting logs without any data loss.
When something is wrong with Loki, the Canary often provides the first indication.&lt;/p&gt;
&lt;p&gt;Loki Canary generates artificial log lines.
These log lines are sent to the Loki cluster.
Loki Canary communicates with the Loki cluster to capture metrics about the
artificial log lines,
such that Loki Canary forms information about the performance of the Loki cluster.
The information is available as Prometheus time series metrics.&lt;/p&gt;
&lt;figure
    class=&#34;figure-wrapper figure-wrapper__lightbox w-100p &#34;
    style=&#34;max-width: 75%;&#34;
    itemprop=&#34;associatedMedia&#34;
    itemscope=&#34;&#34;
    itemtype=&#34;http://schema.org/ImageObject&#34;
  &gt;&lt;a
        class=&#34;lightbox-link&#34;
        href=&#34;./loki-canary-block.png&#34;
        itemprop=&#34;contentUrl&#34;
      &gt;&lt;div class=&#34;img-wrapper w-100p h-auto&#34;&gt;&lt;img
          class=&#34;lazyload &#34;
          data-src=&#34;./loki-canary-block.png&#34;alt=&#34;Loki canary&#34;/&gt;
        &lt;noscript&gt;
          &lt;img
            src=&#34;./loki-canary-block.png&#34;
            alt=&#34;Loki canary&#34;/&gt;
        &lt;/noscript&gt;&lt;/div&gt;&lt;/a&gt;&lt;/figure&gt;
&lt;p&gt;Loki Canary writes a log to standard output and stores the timestamp in an internal
array. The contents look something like this:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;nohighlight&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-nohighlight&#34;&gt;1557935669096040040 ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The relevant part of the log entry is the timestamp; the &lt;code&gt;p&lt;/code&gt;s are just filler
bytes to make the size of the log configurable.&lt;/p&gt;
&lt;p&gt;Loki Canary&amp;rsquo;s standard output should be captured and written to a file. An agent (like Grafana Alloy) should be configured to read the log file and ship it to Loki.&lt;/p&gt;
&lt;p&gt;Meanwhile, Loki Canary will open a WebSocket connection to Loki and will tail
the logs it creates. When a log is received on the WebSocket, the timestamp
in the log message is compared to the internal array.&lt;/p&gt;
&lt;p&gt;If the received log is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The next in the array to be received, it is removed from the array and the
(current time - log timestamp) is recorded in the &lt;code&gt;response_latency&lt;/code&gt;
histogram. This is the expected behavior for well behaving logs.&lt;/li&gt;
&lt;li&gt;Not the next in the array to be received, it is removed from the array, the
response time is recorded in the &lt;code&gt;response_latency&lt;/code&gt; histogram, and the
&lt;code&gt;out_of_order_entries&lt;/code&gt; counter is incremented.&lt;/li&gt;
&lt;li&gt;Not in the array at all, it is checked against a separate list of received
logs to either increment the &lt;code&gt;duplicate_entries&lt;/code&gt; counter or the
&lt;code&gt;unexpected_entries&lt;/code&gt; counter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the background, Loki Canary also runs a timer which iterates through all of
the entries in the internal array. If any of the entries are older than the
duration specified by the &lt;code&gt;-wait&lt;/code&gt; flag (defaulting to 60s), they are removed
from the array and the &lt;code&gt;websocket_missing_entries&lt;/code&gt; counter is incremented. An
additional query is then made directly to Loki for any missing entries to
determine if they are truly missing or only missing from the WebSocket. If
missing entries are not found in the direct query, the &lt;code&gt;missing_entries&lt;/code&gt; counter
is incremented.&lt;/p&gt;
&lt;h3 id=&#34;additional-queries&#34;&gt;Additional Queries&lt;/h3&gt;
&lt;h4 id=&#34;spot-check&#34;&gt;Spot Check&lt;/h4&gt;
&lt;p&gt;Starting with version 1.6.0, the canary will spot check certain results over time
to make sure they are present in Loki, this is helpful for testing the transition
of inmemory logs in the ingester to the store to make sure nothing is lost.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;-spot-check-interval&lt;/code&gt; and &lt;code&gt;-spot-check-max&lt;/code&gt; are used to tune this feature,
&lt;code&gt;-spot-check-interval&lt;/code&gt; will pull a log entry from the stream at this interval
and save it in a separate list up to &lt;code&gt;-spot-check-max&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Every &lt;code&gt;-spot-check-query-rate&lt;/code&gt;, Loki will be queried for each entry in this list and
&lt;code&gt;loki_canary_spot_check_entries_total&lt;/code&gt; will be incremented, if a result
is missing &lt;code&gt;loki_canary_spot_check_missing_entries_total&lt;/code&gt; will be incremented.&lt;/p&gt;
&lt;p&gt;The defaults of &lt;code&gt;15m&lt;/code&gt; for &lt;code&gt;spot-check-interval&lt;/code&gt; and &lt;code&gt;4h&lt;/code&gt; for &lt;code&gt;spot-check-max&lt;/code&gt;
means that after 4 hours of running the canary will have a list of 16 entries
it will query every minute (default &lt;code&gt;spot-check-query-rate&lt;/code&gt; interval is 1m),
so be aware of the query load this can put on Loki if you have a lot of canaries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; if you are using &lt;code&gt;out-of-order-percentage&lt;/code&gt; to test ingestion of out-of-order
log lines be sure not to set the two out of order time range flags too far in the past.
The defaults are already enough to test this functionality properly, and setting them
too far in the past can cause issues with the spot check test.&lt;/p&gt;
&lt;p&gt;When using &lt;code&gt;out-of-order-percentage&lt;/code&gt; you also need to make use of pipeline stages
in your Alloy configuration in order to set the timestamps correctly as the logs are pushed
to Loki. The &lt;a href=&#34;/docs/alloy/latest/reference/components/loki/loki.process/&#34;&gt;Alloy &lt;code&gt;loki.process&lt;/code&gt;&lt;/a&gt; docs have examples of how to do this.&lt;/p&gt;
&lt;h4 id=&#34;metric-test&#34;&gt;Metric Test&lt;/h4&gt;
&lt;p&gt;Loki Canary will run a metric query &lt;code&gt;count_over_time&lt;/code&gt; to
verify that the rate of logs being stored in Loki corresponds to the rate they are being
created by Loki Canary.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;-metric-test-interval&lt;/code&gt; and &lt;code&gt;-metric-test-range&lt;/code&gt; are used to tune this feature, but
by default every &lt;code&gt;15m&lt;/code&gt; the canary will run a &lt;code&gt;count_over_time&lt;/code&gt; instant-query to Loki
for a range of &lt;code&gt;24h&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If the canary has not run for &lt;code&gt;-metric-test-range&lt;/code&gt; (&lt;code&gt;24h&lt;/code&gt;) the query range is adjusted
to the amount of time the canary has been running such that the rate can be calculated
since the canary was started.&lt;/p&gt;
&lt;p&gt;The canary calculates what the expected count of logs would be for the range
(also adjusting this based on canary runtime) and compares the expected result with
the actual result returned from Loki.  The &lt;em&gt;difference&lt;/em&gt; is stored as the value in
the gauge &lt;code&gt;loki_canary_metric_test_deviation&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s expected that there will be some deviation, the method of creating an expected
calculation based on the query rate compared to actual query data is imperfect
and will lead to a deviation of a few log entries.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not expected for there to be a deviation of more than 3-4 log entries.&lt;/p&gt;
&lt;h3 id=&#34;control&#34;&gt;Control&lt;/h3&gt;
&lt;p&gt;Loki Canary responds to two endpoints to allow dynamic suspending/resuming of the
canary process.  This can be useful if you&amp;rsquo;d like to quickly disable or reenable the
canary.  To stop or start the canary issue an HTTP GET request against the &lt;code&gt;/suspend&lt;/code&gt; or
&lt;code&gt;/resume&lt;/code&gt; endpoints.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation&lt;/h2&gt;
&lt;h3 id=&#34;binary&#34;&gt;Binary&lt;/h3&gt;
&lt;p&gt;Loki Canary is provided as a pre-compiled binary as part of the
&lt;a href=&#34;https://github.com/grafana/loki/releases&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Loki Releases&lt;/a&gt; on GitHub.&lt;/p&gt;
&lt;h3 id=&#34;docker&#34;&gt;Docker&lt;/h3&gt;
&lt;p&gt;Loki Canary is also provided as a Docker container image:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;# change tag to the most recent release
$ docker pull grafana/loki-canary:3.7.1&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;kubernetes&#34;&gt;Kubernetes&lt;/h3&gt;
&lt;p&gt;To run on Kubernetes, you can do something simple like:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;kubectl run loki-canary --generator=run-pod/v1 --image=grafana/loki-canary:latest --restart=Never --image-pull-policy=IfNotPresent --labels=name=loki-canary -- -addr=loki:3100&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Or you can do something more complex like deploy it as a DaemonSet, there is a
Tanka setup for this in the &lt;code&gt;production&lt;/code&gt; folder, you can import it using
&lt;code&gt;jsonnet-bundler&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;shell&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-shell&#34;&gt;jb install github.com/grafana/loki-canary/production/ksonnet/loki-canary&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then in your Tanka environment&amp;rsquo;s &lt;code&gt;main.jsonnet&lt;/code&gt; you&amp;rsquo;ll want something like
this:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;jsonnet&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-jsonnet&#34;&gt;local loki_canary = import &amp;#39;loki-canary/loki-canary.libsonnet&amp;#39;;

loki_canary {
  loki_canary_args&amp;#43;:: {
    addr: &amp;#34;loki:3100&amp;#34;,
    port: 80,
    labelname: &amp;#34;instance&amp;#34;,
    interval: &amp;#34;100ms&amp;#34;,
    size: 1024,
    wait: &amp;#34;3m&amp;#34;,
  },
  _config&amp;#43;:: {
    namespace: &amp;#34;default&amp;#34;,
  }
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h4 id=&#34;examples&#34;&gt;Examples&lt;/h4&gt;
&lt;p&gt;Standalone Pod Implementation of loki-canary&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: loki-canary
    name: loki-canary
  name: loki-canary
spec:
  containers:
  - args:
    - -addr=loki:3100
    image: grafana/loki-canary:latest
    imagePullPolicy: IfNotPresent
    name: loki-canary
    resources: {}
---
apiVersion: v1
kind: Service
metadata:
  name: loki-canary
  labels:
    app: loki-canary
spec:
  type: ClusterIP
  selector:
    app: loki-canary
  ports:
  - name: metrics
    protocol: TCP
    port: 3500
    targetPort: 3500&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;DaemonSet Implementation of loki-canary&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;---
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  labels:
    app: loki-canary
    name: loki-canary
  name: loki-canary
spec:
  template:
    metadata:
      name: loki-canary
      labels:
        app: loki-canary
    spec:
      containers:
      - args:
        - -addr=loki:3100
        image: grafana/loki-canary:latest
        imagePullPolicy: IfNotPresent
        name: loki-canary
        resources: {}
---
apiVersion: v1
kind: Service
metadata:
  name: loki-canary
  labels:
    app: loki-canary
spec:
  type: ClusterIP
  selector:
    app: loki-canary
  ports:
  - name: metrics
    protocol: TCP
    port: 3500
    targetPort: 3500&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;from-source&#34;&gt;From Source&lt;/h3&gt;
&lt;p&gt;If the other options are not sufficient for your use case, you can compile
&lt;code&gt;loki-canary&lt;/code&gt; yourself:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Clone the source tree.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;$ git clone https://github.com/grafana/loki&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Build the binary.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;$ make loki-canary&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optional: Build the container image.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;$ make loki-canary-image&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;configuration&#34;&gt;Configuration&lt;/h2&gt;
&lt;p&gt;The address of Loki must be passed in with the &lt;code&gt;-addr&lt;/code&gt; flag or by setting the
environment variable &lt;code&gt;LOKI_ADDRESS&lt;/code&gt;, and if your Loki server uses TLS, &lt;code&gt;-tls=true&lt;/code&gt;
must also be provided. Note that using TLS will cause the WebSocket connection
to use &lt;code&gt;wss://&lt;/code&gt; instead of &lt;code&gt;ws://&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;-labelname&lt;/code&gt; and &lt;code&gt;-labelvalue&lt;/code&gt; flags should also be provided, as these are
used by Loki Canary to filter the log stream to only process logs for the
current instance of the canary. Ensure that the values provided to the flags are
unique to each instance of Loki Canary. Grafana Labs&amp;rsquo; Tanka config
accomplishes this by passing in the Pod name as the label value.&lt;/p&gt;
&lt;p&gt;If Loki Canary reports a high number of &lt;code&gt;unexpected_entries&lt;/code&gt;, Loki Canary may
not be waiting long enough and the value for the &lt;code&gt;-wait&lt;/code&gt; flag should be
increased to a larger value than 60s.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Be aware&lt;/strong&gt; of the relationship between &lt;code&gt;pruneinterval&lt;/code&gt; and the &lt;code&gt;interval&lt;/code&gt;.
For example, with an interval of 10ms (100 logs per second) and a prune interval
of 60s, you will write 6000 logs per minute. If those logs were not received
over the WebSocket, the canary will attempt to query Loki directly to see if
they are completely lost. &lt;strong&gt;However&lt;/strong&gt; the query return is limited to 1000
results so you will not be able to return all the logs even if they did make it
to Loki.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Likewise&lt;/strong&gt;, if you lower the &lt;code&gt;pruneinterval&lt;/code&gt; you risk causing a denial of
service attack as all your canaries attempt to query for missing logs at
whatever your &lt;code&gt;pruneinterval&lt;/code&gt; is defined at.&lt;/p&gt;
&lt;p&gt;All options:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;  -addr string
    	The Loki server URL:Port, e.g. loki:3100. Loki address can also be set using the environment variable LOKI_ADDRESS.
  -buckets int
    	Number of buckets in the response_latency histogram (default 10)
  -ca-file string
    	Client certificate authority for optional use with TLS connection to Loki
  -cert-file string
    	Client PEM encoded X.509 certificate for optional use with TLS connection to Loki
  -insecure
    	Allow insecure TLS connections
  -interval duration
    	Duration between log entries (default 1s)
  -key-file string
    	Client PEM encoded X.509 key for optional use with TLS connection to Loki
  -labels string
        Comma-separated string of labels for the query e.g. &amp;#39;service=loki,app=canary&amp;#39;. The parsing logic for this argument is simple, label values must not contain a comma or special characters and should not be quoted. Overwrites labelname and streamname
  -labelname string
    	The label name for this instance of loki-canary to use in the log selector (default &amp;#34;name&amp;#34;)
  -labelvalue string
    	The unique label value for this instance of loki-canary to use in the log selector (default &amp;#34;loki-canary&amp;#34;)
  -max-wait duration
    	Duration to keep querying Loki for missing websocket entries before reporting them missing (default 5m0s)
  -metric-test-interval duration
    	The interval the metric test query should be run (default 1h0m0s)
  -metric-test-range duration
    	The range value [24h] used in the metric test instant-query. Note: this value is truncated to the running time of the canary until this value is reached (default 24h0m0s)
  -out-of-order-max duration
    	Maximum amount of time to go back for out of order entries (in seconds). (default 1m0s)
  -out-of-order-min duration
    	Minimum amount of time to go back for out of order entries (in seconds). (default 30s)
  -out-of-order-percentage int
    	Percentage (0-100) of log entries that should be sent out of order.
  -pass string
    	Loki password. This credential should have both read and write permissions to Loki endpoints
  -port int
    	Port which loki-canary should expose metrics (default 3500)
  -pruneinterval duration
    	Frequency to check sent vs received logs, also the frequency which queries for missing logs will be dispatched to loki (default 1m0s)
  -push
    	Push the logs directly to given Loki address
  -query-append string
        LogQL filters to be appended to the Canary query e.g. &amp;#39;| json | line_format `{{.log}}`&amp;#39;  	
  -query-timeout duration
    	How long to wait for a query response from Loki (default 10s)
  -size int
    	Size in bytes of each log line (default 100)
  -spot-check-initial-wait duration
    	How long should the spot check query wait before starting to check for entries (default 10s)
  -spot-check-interval duration
    	Interval that a single result will be kept from sent entries and spot-checked against Loki, e.g. 15min default one entry every 15 min will be saved and then queried again every 15min until spot-check-max is reached (default 15m0s)
  -spot-check-max duration
    	How far back to check a spot check entry before dropping it (default 4h0m0s)
  -spot-check-query-rate duration
    	Interval that the canary will query Loki for the current list of all spot check entries (default 1m0s)
  -streamname string
    	The stream name for this instance of loki-canary to use in the log selector (default &amp;#34;stream&amp;#34;)
  -streamvalue string
    	The unique stream value for this instance of loki-canary to use in the log selector (default &amp;#34;stdout&amp;#34;)
  -tenant-id string
    	Tenant ID to be set in X-Scope-OrgID header.
  -tls
    	Does the loki connection use TLS?
  -user string
    	Loki username.
  -version
    	Print this builds version information
  -wait duration
    	Duration to wait for log entries on websocket before querying loki for them (default 1m0s)
  -write-max-backoff duration
    	Maximum backoff time between retries  (default 5m0s)
  -write-max-retries int
    	Maximum number of retries when push a log entry  (default 10)
  -write-min-backoff duration
    	Initial backoff time before first retry  (default 500ms)
  -write-timeout duration
    	How long to wait write response from Loki (default 10s)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;monolithic-mode-setup&#34;&gt;Monolithic mode setup&lt;/h2&gt;
&lt;p&gt;This section describes how to set up Loki Canary for Loki&amp;rsquo;s 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/deployment-modes/#monolithic-mode&#34;&gt;monolithic mode&lt;/a&gt; using Systemd, Alloy, and Prometheus.&lt;/p&gt;
&lt;h3 id=&#34;systemd&#34;&gt;Systemd&lt;/h3&gt;
&lt;p&gt;Create a systemd service file that writes Loki Canary&amp;rsquo;s standard output to the file &lt;code&gt;/var/log/loki-canary.log&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;ini&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-ini&#34;&gt;[Unit]
Description=Loki Canary
Documentation=https://grafana.com/docs/loki/latest/operations/loki-canary/

[Service]
User=loki
ExecStart=/usr/bin/loki-canary -addr=localhost:3100 -labelname=job -labelvalue=loki_canary -streamname=job -streamvalue=loki_canary
Restart=on-failure
RestartSec=5
StandardOutput=append:/var/log/loki-canary.log
StandardError=journal

[Install]
WantedBy=multi-user.target&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;-labelname&lt;/code&gt; and &lt;code&gt;-labelvalue&lt;/code&gt; flags specify a label pair used to identify Loki Canary&amp;rsquo;s logs. &lt;code&gt;-streamname&lt;/code&gt; and &lt;code&gt;-streamvalue&lt;/code&gt; flags specify an additional label pair and must be provided. The same values can be provided to both label pairs if no additional label exists. Labels can be added when Alloy scrapes the logs.&lt;/p&gt;
&lt;h3 id=&#34;scrape-logs&#34;&gt;Scrape logs&lt;/h3&gt;
&lt;p&gt;Scrape the &lt;code&gt;/var/log/loki-canary.log&lt;/code&gt; file with Alloy.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Alloy&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-alloy&#34;&gt;loki.source.file &amp;#34;canary&amp;#34; {
  forward_to = [loki.write.local.receiver]
  targets = [{
    __path__ = &amp;#34;/var/log/loki-canary.log&amp;#34;,
    job      = &amp;#34;loki_canary&amp;#34;,
  }]
}

loki.write &amp;#34;local&amp;#34; {
  endpoint {
    url  = &amp;#34;http://localhost:3100/loki/api/v1/push&amp;#34;
  }
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;scrape-metrics&#34;&gt;Scrape metrics&lt;/h3&gt;
&lt;p&gt;Scrape Loki Canary&amp;rsquo;s metrics with Alloy or Prometheus.&lt;/p&gt;
&lt;h4 id=&#34;scrape-metrics-with-alloy&#34;&gt;Scrape metrics with Alloy&lt;/h4&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Alloy&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-alloy&#34;&gt;prometheus.scrape &amp;#34;loki&amp;#34; {
  targets    = [{__address__ = &amp;#34;localhost:3100&amp;#34;}]
  forward_to = [prometheus.remote_write.default.receiver]
}

prometheus.remote_write &amp;#34;default&amp;#34; {
  endpoint {  
    url = &amp;#34;&amp;lt;PROMETHEUS_REMOTE_WRITE_URL&amp;gt;&amp;#34;
  }  
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h4 id=&#34;scrape-metrics-with-prometheus&#34;&gt;Scrape metrics with Prometheus&lt;/h4&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;scrape_configs:
  - job_name: loki-canary
    static_configs:
      - targets: [&amp;#39;localhost:3500&amp;#39;]&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="audit-data-propagation-latency-and-correctness-using-loki-canary">Audit data propagation latency and correctness using Loki Canary&lt;/h1>
&lt;p>Loki Canary is a standalone app that audits the log-capturing performance of a Grafana Loki cluster.&lt;br />
This component emits and periodically queries for logs, making sure that Loki is ingesting logs without any data loss.
When something is wrong with Loki, the Canary often provides the first indication.&lt;/p></description></item><item><title>Block unwanted queries</title><link>https://grafana.com/docs/loki/v3.7.x/operations/blocking-queries/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/blocking-queries/</guid><content><![CDATA[&lt;h1 id=&#34;block-unwanted-queries&#34;&gt;Block unwanted queries&lt;/h1&gt;
&lt;p&gt;In certain situations, you may not be able to control the queries being sent to your Loki installation. These queries
may be intentionally or unintentionally expensive to run, and they may affect the overall stability or cost of running
your service.&lt;/p&gt;
&lt;p&gt;You can block queries using 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#runtime-configuration-file&#34;&gt;per-tenant overrides&lt;/a&gt;, like so:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  &amp;#34;tenant-id&amp;#34;:
    blocked_queries:
      # block this query exactly
      - pattern: &amp;#39;sum(rate({env=&amp;#34;prod&amp;#34;}[1m]))&amp;#39;

      # block any query matching this regex pattern 
      - pattern: &amp;#39;.*prod.*&amp;#39;
        regex: true

      # block all metric queries
      - types: metric

      # block any filter or limited queries matching this regex pattern 
      - pattern: &amp;#39;.*prod.*&amp;#39;
        regex: true
        types: filter,limited

      # block any query that matches this query hash
      - hash: 2943214005          # hash of {stream=&amp;#34;stdout&amp;#34;,pod=&amp;#34;loki-canary-9w49x&amp;#34;}
        types: filter,limited

      # block queries originating from specific sources via X-Query-Tags
      # Keys and values are matched case-insensitively.
      - pattern: &amp;#39;.*&amp;#39;             # optional; if pattern and regex are omittied they will default to &amp;#39;.*&amp;#39; and true
        regex: true
        query_tags:
          source: grafana
          feature: beta&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;Changes to these configurations &lt;strong&gt;do not require a restart&lt;/strong&gt;; they are defined in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#runtime-configuration-file&#34;&gt;runtime configuration file&lt;/a&gt;.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;The available query types are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;metric&lt;/code&gt;: a query with an aggregation, e.g. &lt;code&gt;sum(rate({env=&amp;quot;prod&amp;quot;}[1m]))&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;filter&lt;/code&gt;: a query with a log filter, e.g. &lt;code&gt;{env=&amp;quot;prod&amp;quot;} |= &amp;quot;error&amp;quot;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;limited&lt;/code&gt;: a query without a filter or a metric aggregation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;hash&lt;/code&gt; option uses a &lt;a href=&#34;https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;32-bit FNV-1&lt;/a&gt; hash of the query string, represented as a 32-bit unsigned integer.
This can often be easier to use than query strings that are long or require lots of string escaping. A &lt;code&gt;query_hash&lt;/code&gt; field
is logged with every query request in the &lt;code&gt;query-frontend&lt;/code&gt; and &lt;code&gt;querier&lt;/code&gt; logs, for easy reference. Here&amp;rsquo;s an example log line:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;logfmt&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-logfmt&#34;&gt;level=info ts=2023-03-30T09:08:15.2614555Z caller=metrics.go:152 component=frontend org_id=29 latency=fast 
query=&amp;#34;{stream=\&amp;#34;stdout\&amp;#34;,pod=\&amp;#34;loki-canary-9w49x\&amp;#34;}&amp;#34; query_hash=2943214005 query_type=limited range_type=range ...&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;The order of patterns is preserved, so the first matching pattern will be used.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h2 id=&#34;observing-blocked-queries&#34;&gt;Observing blocked queries&lt;/h2&gt;
&lt;p&gt;Blocked queries are logged, as well as counted in the &lt;code&gt;loki_blocked_queries&lt;/code&gt; metric on a per-tenant basis.&lt;/p&gt;
&lt;p&gt;When a policy matches by pattern/hash/regex, Loki logs whether the query type and request tags matched that policy:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;logfmt&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-logfmt&#34;&gt;level=warn msg=&amp;#34;query blocker matched with regex policy&amp;#34; user=29 type=metric pattern=&amp;#34;.*rate\\(.*\\).*&amp;#34; query=&amp;#34;sum(rate({app=\&amp;#34;foo\&amp;#34;}[5m]))&amp;#34; typesMatched=true tagsMatched=false blocked=false&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If tag constraints fail to match, Loki emits a debug log showing the missing key and the raw header value that was received:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;logfmt&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-logfmt&#34;&gt;level=debug msg=&amp;#34;query blocker tags mismatch: missing or mismatched key&amp;#34; key=feature tagsRaw=&amp;#34;Source=grafana,Feature=alpha&amp;#34;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;scope&#34;&gt;Scope&lt;/h2&gt;
&lt;p&gt;Queries received via the API and executed as &lt;a href=&#34;../../alert/&#34;&gt;alerting/recording rules&lt;/a&gt; will be blocked.&lt;/p&gt;
&lt;h2 id=&#34;tag-based-blocking&#34;&gt;Tag-based blocking&lt;/h2&gt;
&lt;p&gt;You can scope a blocked query rule to requests that include specific key=value pairs in the &lt;code&gt;X-Query-Tags&lt;/code&gt; header.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Header format: &lt;code&gt;key=value&lt;/code&gt; pairs separated by commas, for example: &lt;code&gt;Source=grafana,Feature=beta&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Allowed characters are alphanumeric plus space, comma, equals, &amp;lsquo;@&amp;rsquo;, &amp;lsquo;.&amp;rsquo;, and &amp;lsquo;-&amp;rsquo;. Any other characters are replaced with &lt;code&gt;_&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Parsing keeps only canonical &lt;code&gt;key=value&lt;/code&gt; tokens; malformed tokens are ignored.&lt;/li&gt;
&lt;li&gt;Matching rules:
&lt;ul&gt;
&lt;li&gt;Keys are matched case-insensitively (the server lowercases keys).&lt;/li&gt;
&lt;li&gt;Values are matched case-insensitively.&lt;/li&gt;
&lt;li&gt;All specified &lt;code&gt;tags:&lt;/code&gt; pairs in the rule must be present in the request to apply the block.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Examples:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  tenant-a:
    blocked_queries:
      # Block only metric queries from a beta feature flag
      - types: metric
        query_tags:
          feature: beta

      # Combine with regex to narrow scope further
      - pattern: &amp;#39;.*rate\\(.*\\).*&amp;#39;
        regex: true
        query_tags:
          source: grafana&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="block-unwanted-queries">Block unwanted queries&lt;/h1>
&lt;p>In certain situations, you may not be able to control the queries being sent to your Loki installation. These queries
may be intentionally or unintentionally expensive to run, and they may affect the overall stability or cost of running
your service.&lt;/p></description></item><item><title>Configure caches to speed up queries</title><link>https://grafana.com/docs/loki/v3.7.x/operations/caching/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/caching/</guid><content><![CDATA[&lt;h1 id=&#34;configure-caches-to-speed-up-queries&#34;&gt;Configure caches to speed up queries&lt;/h1&gt;
&lt;p&gt;Loki supports two types of caching for query results and chunks to speed up query performance and reduce calls to the storage layer. Memcached is included in the Loki Helm chart and enabled by default for the &lt;code&gt;chunksCache&lt;/code&gt; and &lt;code&gt;resultsCache&lt;/code&gt;.
This sections describes the recommended Memcached configuration to enable caching for chunks and query results.&lt;/p&gt;
&lt;h4 id=&#34;results-cache&#34;&gt;Results cache&lt;/h4&gt;
&lt;p&gt;The results cache stores the results for index-stat, instant-metric, label and volume queries and it supports negative caching for log queries. It is sometimes called frontend cache in some configurations. For details of each supported request type, refer to the 
    &lt;a href=&#34;/docs/loki/v3.7.x//get-started/components/&#34;&gt;Components section&lt;/a&gt;.
The results cache is consulted by query-frontends to be used in subsequent queries. If the cached results are incomplete, the query frontend calculates the required sub-queries and sends them further along to be executed in queriers, then also caches those results.
To orchestrate all of the above, the results cache uses a query hash as the key that is computed and stored in the headers.&lt;/p&gt;
&lt;p&gt;The index lookup cache only supports the legacy BoltDB index storage and is configured to be in-memory by default.
Since moving to the TSDB indexes the attached disks/persistent volumes are utilised as cache and in-memory index lookup cache is obsolete.&lt;/p&gt;
&lt;h4 id=&#34;chunks-cache&#34;&gt;Chunks cache&lt;/h4&gt;
&lt;p&gt;The chunks are cached using the &lt;code&gt;chunkRef&lt;/code&gt; as the cache key, which is the unique reference to a chunk when it&amp;rsquo;s cut in the Loki ingesters.
The chunk cache is consulted by queriers each time a set of &lt;code&gt;chunkRef&lt;/code&gt;s are calculated to serve the query, before going to the storage layer.&lt;/p&gt;
&lt;p&gt;Query results are significantly smaller compared to chunks. As the Loki cluster gets bigger in ingested volume, the results cache can continue to perform, whereas the chunks cache will need to grow in proportion to demand more memory.
To be able to support the growing needs of a cluster, in 2023 we introduced support for memcached-extstore. Extstore is an additional feature on Memcached which supports attaching SSD disks to memcached pods to maximize their capacity.&lt;/p&gt;
&lt;p&gt;Please see this &lt;a href=&#34;/blog/2023/08/23/how-we-scaled-grafana-cloud-logs-memcached-cluster-to-50tb-and-improved-reliability/&#34;&gt;blog post&lt;/a&gt; on Loki&amp;rsquo;s experience with memcached-extstore for our SaaS offfering, Grafana Cloud.
For more information on how to tune memcached-extstore please consult the open source &lt;a href=&#34;https://docs.memcached.org/advisories/grafanaloki/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;memcached documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;before-you-begin&#34;&gt;Before you begin&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;It is recommended to deploy separate Memcached type as separate components (&lt;code&gt;memcached_frontend&lt;/code&gt; and &lt;code&gt;memcached_chunks&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;As of 2025-02-06, the &lt;code&gt;memcached:1.6.32-alpine&lt;/code&gt; version of the library is recommended.&lt;/li&gt;
&lt;li&gt;Consult the Loki ksonnet &lt;a href=&#34;https://github.com/grafana/loki/blob/main/production/ksonnet/loki/memcached.libsonnet&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;memcached&lt;/a&gt; deployment and the ksonnet &lt;a href=&#34;https://github.com/grafana/jsonnet-libs/tree/master/memcached&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;memcached library&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Index caching is not required for the 
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/storage/tsdb/#index-caching-not-required&#34;&gt;TSDB&lt;/a&gt; index format.&lt;/li&gt;
&lt;li&gt;For recommendations on scaling the cache, refer to the 
    &lt;a href=&#34;/docs/loki/v3.7.x/setup/size/&#34;&gt;Size the cluster&lt;/a&gt; page.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;steps&#34;&gt;Steps&lt;/h2&gt;
&lt;p&gt;To enable and configure Memcached:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Deploy each Memcached service with at least three replicas and configure
each as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Chunk cache

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;--memory-limit=4096 --max-item-size=2m --conn-limit=1024&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Query result cache

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;--memory-limit=1024 --max-item-size=5m --conn-limit=1024&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Configure Loki to use the cache.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If the Helm chart is used&lt;/p&gt;
&lt;p&gt;Set &lt;code&gt;memcached.chunk_cache.host&lt;/code&gt; to the Memcached address for the chunk cache, &lt;code&gt;memcached.results_cache.host&lt;/code&gt; to the Memcached address for the query result cache, &lt;code&gt;memcached.chunk_cache.enabled=true&lt;/code&gt; and &lt;code&gt;memcached.results_cache.enabled=true&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Ensure that the connection limit of Memcached is at least &lt;code&gt;number_of_clients * max_idle_conns&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The options &lt;code&gt;host&lt;/code&gt; and &lt;code&gt;service&lt;/code&gt; depend on the type of installation. For example, using the &lt;code&gt;bitnami/memcached&lt;/code&gt; Helm Charts with the following commands, the &lt;code&gt;service&lt;/code&gt; values are always &lt;code&gt;memcached&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;helm upgrade --install chunk-cache -n loki bitnami/memcached -f memcached-overrides-chunk.yaml
helm upgrade --install results-cache -n loki bitnami/memcached -f memcached-overrides-results.yaml&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The current Helm Chart only supports the chunk and results cache.&lt;/p&gt;
&lt;p&gt;In this case, the Loki configuration would be&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;loki:
  memcached:
    chunk_cache:
      enabled: true
      host: chunk-cache-memcached.loki.svc
      service: memcached-client
      batch_size: 256
      parallelism: 10
    results_cache:
      enabled: true
      host: results-cache-memcached.loki.svc
      service: memcached-client
      default_validity: 12h&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If the Loki configuration is used, modify the following two sections in
the Loki configuration file.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure the chunk cache

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;chunk_store_config:
  chunk_cache_config:
    memcached:
      batch_size: 256
      parallelism: 10
    memcached_client:
      host: &amp;lt;chunk cache memcached host&amp;gt;
      service: &amp;lt;port name of memcached service&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;Configure the query result cache

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;query_range:
  cache_results: true
  results_cache:
    cache:
      memcached_client:
        consistent_hash: true
        host: &amp;lt;memcached host&amp;gt;
        service: &amp;lt;port name of memcached service&amp;gt;
        max_idle_conns: 16
        timeout: 200ms
        update_interval: 1m&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
]]></content><description>&lt;h1 id="configure-caches-to-speed-up-queries">Configure caches to speed up queries&lt;/h1>
&lt;p>Loki supports two types of caching for query results and chunks to speed up query performance and reduce calls to the storage layer. Memcached is included in the Loki Helm chart and enabled by default for the &lt;code>chunksCache&lt;/code> and &lt;code>resultsCache&lt;/code>.
This sections describes the recommended Memcached configuration to enable caching for chunks and query results.&lt;/p></description></item><item><title>Enforce rate limits and push request validation</title><link>https://grafana.com/docs/loki/v3.7.x/operations/request-validation-rate-limits/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/request-validation-rate-limits/</guid><content><![CDATA[&lt;h1 id=&#34;enforce-rate-limits-and-push-request-validation&#34;&gt;Enforce rate limits and push request validation&lt;/h1&gt;
&lt;p&gt;Loki will reject requests if they exceed a usage threshold (rate limit error) or if they are invalid (validation error).&lt;/p&gt;
&lt;p&gt;All occurrences of these errors can be observed using the &lt;code&gt;loki_discarded_samples_total&lt;/code&gt; and &lt;code&gt;loki_discarded_bytes_total&lt;/code&gt; metrics. The sections below describe the various possible reasons specified in the &lt;code&gt;reason&lt;/code&gt; label of these metrics.&lt;/p&gt;
&lt;p&gt;It is recommended that Loki operators set up alerts or dashboards with these metrics to detect when rate limits or validation errors occur.&lt;/p&gt;
&lt;h3 id=&#34;terminology&#34;&gt;Terminology&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;sample&lt;/strong&gt;: a log line with &lt;a href=&#34;../../get-started/labels/structured-metadata/&#34;&gt;structured metadata&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;stream&lt;/strong&gt;: samples with a unique combination of labels&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;active stream&lt;/strong&gt;: streams that are present in the ingesters - these have recently received log lines within the &lt;code&gt;chunk_idle_period&lt;/code&gt; period (default: 30m)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;rate-limit-errors&#34;&gt;Rate-Limit Errors&lt;/h2&gt;
&lt;p&gt;Rate-limits are enforced when Loki cannot handle more requests from a tenant.&lt;/p&gt;
&lt;h3 id=&#34;rate_limited&#34;&gt;&lt;code&gt;rate_limited&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This rate limit is enforced when a tenant has exceeded their configured log ingestion rate limit.&lt;/p&gt;
&lt;p&gt;One solution if you&amp;rsquo;re seeing samples dropped due to &lt;code&gt;rate_limited&lt;/code&gt; is simply to increase the rate limits on your Loki cluster. These limits can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. The config options to use are &lt;code&gt;ingestion_rate_mb&lt;/code&gt; and &lt;code&gt;ingestion_burst_size_mb&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Note that you&amp;rsquo;ll want to make sure your Loki cluster has sufficient resources provisioned to be able to accommodate these higher limits. Otherwise your cluster may experience performance degradation as it tries to handle this higher volume of log lines to ingest.&lt;/p&gt;
&lt;p&gt;Another option to address samples being dropped due to &lt;code&gt;rate_limits&lt;/code&gt; is simply to decrease the rate of log lines being sent to your Loki cluster. Consider collecting logs from fewer targets or setting up &lt;a href=&#34;/docs/alloy/latest/reference/components/loki/loki.process/#stagedrop-block&#34;&gt;drop stages&lt;/a&gt; in Alloy to filter out certain log lines. You can also use Alloy&amp;rsquo;s &lt;a href=&#34;/docs/alloy/latest/reference/components/loki/loki.process/#stagelimit-block&#34;&gt;rate limiting&lt;/a&gt; to control the volume of logs sent to your Loki cluster.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;No&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;429 Too Many Requests&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h3 id=&#34;per_stream_rate_limit&#34;&gt;&lt;code&gt;per_stream_rate_limit&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This limit is enforced when a single stream reaches its rate limit.&lt;/p&gt;
&lt;p&gt;Each stream has a rate limit applied to it to prevent individual streams from overwhelming the set of ingesters it is distributed to (the size of that set is equal to the &lt;code&gt;replication_factor&lt;/code&gt; value).&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. The config options to adjust are &lt;code&gt;per_stream_rate_limit&lt;/code&gt; and &lt;code&gt;per_stream_rate_limit_burst&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Another option you could consider to decrease the rate of samples dropped due to &lt;code&gt;per_stream_rate_limit&lt;/code&gt; is to split the stream that is getting rate limited into several smaller streams. A third option is to use the Alloy &lt;a href=&#34;/docs/alloy/latest/reference/components/loki/loki.process/#stagelimit-block&#34;&gt;&lt;code&gt;stage.limit&lt;/code&gt; block&lt;/a&gt; to limit the rate of samples sent to the stream hitting the &lt;code&gt;per_stream_rate_limit&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;We typically recommend setting &lt;code&gt;per_stream_rate_limit&lt;/code&gt; no higher than 5MB, and &lt;code&gt;per_stream_rate_limit_burst&lt;/code&gt; no higher than 20MB.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;ingester&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;No&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;429 Too Many Requests&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h3 id=&#34;stream_limit&#34;&gt;&lt;code&gt;stream_limit&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This limit is enforced when a tenant reaches their maximum number of active streams.&lt;/p&gt;
&lt;p&gt;Active streams are held in memory buffers in the ingesters, and if this value becomes sufficiently large then it will cause the ingesters to run out of memory.&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file.  To increase the allowable active streams, adjust &lt;code&gt;max_global_streams_per_user&lt;/code&gt;. Alternatively, the number of active streams can be reduced by removing extraneous labels or removing excessive unique label values.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;ingester&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;No&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;429 Too Many Requests&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;validation-errors&#34;&gt;Validation Errors&lt;/h2&gt;
&lt;p&gt;Validation errors occur when a request violates a validation rule defined by Loki.&lt;/p&gt;
&lt;h3 id=&#34;line_too_long&#34;&gt;&lt;code&gt;line_too_long&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This error occurs when a log line exceeds the maximum allowable length in bytes. The HTTP response will include the stream to which the offending log line belongs as well as its size in bytes.&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. To increase the maximum line size, adjust &lt;code&gt;max_line_size&lt;/code&gt;.  We recommend that you do not increase this value above 256kb for performance reasons. Alternatively, Loki can be configured to ingest truncated versions of log lines over the length limit by using the &lt;code&gt;max_line_size_truncate&lt;/code&gt; option.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h3 id=&#34;invalid_labels&#34;&gt;&lt;code&gt;invalid_labels&lt;/code&gt;&lt;/h3&gt;
&lt;p&gt;This error occurs when one or more labels in the submitted streams fail validation.&lt;/p&gt;
&lt;p&gt;Loki uses the &lt;a href=&#34;https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;same validation rules as Prometheus&lt;/a&gt; for validating labels.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Label names may contain ASCII letters, numbers, as well as underscores. They must match the regex &lt;code&gt;[a-zA-Z_][a-zA-Z0-9_]*&lt;/code&gt;. Label names beginning with __ are reserved for internal use.&lt;/p&gt;&lt;/blockquote&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;No&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;missing_labels&#34;&gt;&lt;code&gt;missing_labels&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;This validation error is returned when a stream is submitted without any labels.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;No&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;too_far_behind-and-out_of_order&#34;&gt;&lt;code&gt;too_far_behind&lt;/code&gt; and &lt;code&gt;out_of_order&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;too_far_behind&lt;/code&gt; and &lt;code&gt;out_of_order&lt;/code&gt; reasons are identical. Loki clusters with &lt;code&gt;unordered_writes=true&lt;/code&gt; (the default value as of Loki v2.4) use &lt;code&gt;reason=too_far_behind&lt;/code&gt;. Loki clusters with &lt;code&gt;unordered_writes=false&lt;/code&gt; use &lt;code&gt;reason=out_of_order&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This validation error is returned when a stream is submitted out of order. More details can be found 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#accept-out-of-order-writes&#34;&gt;here&lt;/a&gt; about the Loki ordering constraints.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;unordered_writes&lt;/code&gt; config value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file, whereas &lt;code&gt;max_chunk_age&lt;/code&gt; is a global configuration.&lt;/p&gt;
&lt;p&gt;This problem can be solved by ensuring that log delivery is configured correctly, or by increasing the &lt;code&gt;max_chunk_age&lt;/code&gt; value.&lt;/p&gt;
&lt;p&gt;It is recommended to resist modifying the default value of &lt;code&gt;max_chunk_age&lt;/code&gt; as this has other implications, and to instead try track down the cause for delayed logged delivery. It should also be noted that this a per-stream error, so by simply splitting streams (adding more labels) this problem can be circumvented, especially if multiple hosts are sending samples for a single stream.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;ingester&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;No&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;greater_than_max_sample_age&#34;&gt;&lt;code&gt;greater_than_max_sample_age&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If the &lt;code&gt;reject_old_samples&lt;/code&gt; config option is set to &lt;code&gt;true&lt;/code&gt; (it is by default), then samples will be rejected with &lt;code&gt;reason=greater_than_max_sample_age&lt;/code&gt; if they are older than the &lt;code&gt;reject_old_samples_max_age&lt;/code&gt; value. You should not see samples rejected for &lt;code&gt;reason=greater_than_max_sample_age&lt;/code&gt; if &lt;code&gt;reject_old_samples=false&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. This error can be solved by increasing the &lt;code&gt;reject_old_samples_max_age&lt;/code&gt; value, or investigating why log delivery is delayed for this particular stream. The stream in question will be returned in the body of the HTTP response.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;400 Bad Request&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;too_far_in_future&#34;&gt;&lt;code&gt;too_far_in_future&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If a sample&amp;rsquo;s timestamp is greater than the current timestamp, Loki allows for a certain grace period during which samples will be accepted. If the grace period is exceeded, the error will occur.&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. This error can be solved by increasing the &lt;code&gt;creation_grace_period&lt;/code&gt; value, or investigating why this particular stream has a timestamp too far into the future. The stream in question will be returned in the body of the HTTP response.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;400 Bad Request&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;max_label_names_per_series&#34;&gt;&lt;code&gt;max_label_names_per_series&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If a sample is submitted with more labels than Loki has been configured to allow, it will be rejected with the &lt;code&gt;max_label_names_per_series&lt;/code&gt; reason. Note that &amp;lsquo;series&amp;rsquo; is the same thing as a &amp;lsquo;stream&amp;rsquo; in Loki - the &amp;lsquo;series&amp;rsquo; term is a legacy name.&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. This error can be solved by increasing the &lt;code&gt;max_label_names_per_series&lt;/code&gt; value. The stream to which the offending sample (i.e. the one with too many label names) belongs will be returned in the body of the HTTP response.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;400 Bad Request&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;label_name_too_long&#34;&gt;&lt;code&gt;label_name_too_long&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If a sample is sent with a label name that has a length in bytes greater than Loki has been configured to allow, it will be rejected with the &lt;code&gt;label_name_too_long&lt;/code&gt; reason.&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. This error can be solved by increasing the &lt;code&gt;max_label_name_length&lt;/code&gt; value, though we do not recommend raising it significantly above the default value of &lt;code&gt;1024&lt;/code&gt; for performance reasons. The offending stream will be returned in the body of the HTTP response.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;400 Bad Request&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;label_value_too_long&#34;&gt;&lt;code&gt;label_value_too_long&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If a sample has a label value with a length in bytes greater than Loki has been configured to allow, it will be rejected for the &lt;code&gt;label_value_too_long&lt;/code&gt; reason.&lt;/p&gt;
&lt;p&gt;This value can be modified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; block, or on a per-tenant basis in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;runtime overrides&lt;/a&gt; file. This error can be solved by increasing the &lt;code&gt;max_label_value_length&lt;/code&gt; value. The offending stream will be returned in the body of the HTTP response.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;Yes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;400 Bad Request&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;h2 id=&#34;duplicate_label_names&#34;&gt;&lt;code&gt;duplicate_label_names&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;If a sample is sent with two or more identical labels, it will be rejected for the &lt;code&gt;duplicate_label_names&lt;/code&gt; reason.&lt;/p&gt;
&lt;p&gt;The offending stream will be returned in the body of the HTTP response.&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Property&lt;/th&gt;
              &lt;th&gt;Value&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;Enforced by&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;distributor&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Outcome&lt;/td&gt;
              &lt;td&gt;Request rejected&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Retryable&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Sample discarded&lt;/td&gt;
              &lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;Configurable per tenant&lt;/td&gt;
              &lt;td&gt;No&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;HTTP status code&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;400 Bad Request&lt;/code&gt;&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;]]></content><description>&lt;h1 id="enforce-rate-limits-and-push-request-validation">Enforce rate limits and push request validation&lt;/h1>
&lt;p>Loki will reject requests if they exceed a usage threshold (rate limit error) or if they are invalid (validation error).&lt;/p></description></item><item><title>Ensure query fairness within tenants using actors</title><link>https://grafana.com/docs/loki/v3.7.x/operations/query-fairness/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/query-fairness/</guid><content><![CDATA[&lt;h1 id=&#34;ensure-query-fairness-within-tenants-using-actors&#34;&gt;Ensure query fairness within tenants using actors&lt;/h1&gt;
&lt;p&gt;Loki uses &lt;a href=&#34;../shuffle-sharding/&#34;&gt;shuffle sharding&lt;/a&gt;
to minimize impact across tenants in case of querier failures or misbehaving
neighboring tenants.&lt;/p&gt;
&lt;p&gt;When there are potentially a lot of different actors using the same tenant to
query logs, such as users accessing Loki from Grafana or via LogCLI or other
applications using the HTTP API, it can lead to contention between queries of
different users, because they all share the same resources for a tenant.&lt;/p&gt;
&lt;p&gt;In that case, as an operator, you would also want to ensure some sort of query
fairness across these actors within the tenants. An actor could be a Grafana user,
a CLI user, or an application accessing the API. To achieve that, Loki
introduced hierarchical scheduler queues in version 2.9 based on
&lt;a href=&#34;../../community/lids/0003-queryfairnessinscheduler/&#34;&gt;LID 0003: Query fairness across users within tenants&lt;/a&gt;
and they are enabled by default.&lt;/p&gt;
&lt;h2 id=&#34;what-are-hierarchical-queues-and-how-do-they-work&#34;&gt;What are hierarchical queues and how do they work&lt;/h2&gt;
&lt;p&gt;To understand hierarchical queues, we first need to know that in the scheduler
component each tenant has its own first in first out (FIFO) queue where
sub-queries are enqueued. Sub-queries are queries that result from splitting
and sharding of a query sent by a client using HTTP.&lt;/p&gt;
&lt;p&gt;Tenant queues are the first level of the queue hierarchy. When a tenant
executes a query without any further controls, all of its sub-queries are
enqueued to the first level queue.&lt;/p&gt;
&lt;p&gt;The second level of the queue hierarchy is that the tenant can have sub-queues.&lt;/p&gt;
&lt;p&gt;Similar to how shuffle sharding assigns queries at the tenant level, each time
the Loki Scheduler makes a round-robin pick at the second level of the query
hierarchy, it selects a query from the tenant’s local queue and subqueues.&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;./hierarchical-queues.png&#34;
  alt=&#34;Hierarchical queues&#34;/&gt;&lt;/p&gt;
&lt;p&gt;The figure above shows that a tenant queue has a local queue, which is a leaf
node in the queue tree, and a set of sub-queues. Each sub-queue, again like the
tenant queue, consists of a local queue, and possible sub-queues, resulting in
a recursive tree structure.&lt;/p&gt;
&lt;p&gt;So, how can we make use of these tree-like queue structures to achieve query fairness?&lt;/p&gt;
&lt;h2 id=&#34;how-to-control-query-fairness&#34;&gt;How to control query fairness&lt;/h2&gt;
&lt;p&gt;As already mentioned, by default, sub-queries are only enqueued at the first
(tenant) level of the queue tree. The tenant is provided by the &lt;code&gt;X-Scope-OrgID&lt;/code&gt;
header that is required when running Loki in multi-tenant mode.&lt;/p&gt;
&lt;p&gt;You use the HTTP header &lt;code&gt;X-Loki-Actor-Path&lt;/code&gt; to control to which sub-queue a
query (or more correctly its sub-queries) is enqueued.&lt;/p&gt;
&lt;p&gt;The following example shows a &lt;code&gt;curl&lt;/code&gt; command that invokes the HTTP endpoint for range queries
and passes both the &lt;code&gt;X-Scope-OrgID&lt;/code&gt; and the &lt;code&gt;X-Loki-Actor-Path&lt;/code&gt; headers.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;curl -s http://localhost:3100/loki/api/v1/query_range?xxx \
    -H &amp;#39;X-Scope-OrgID: grafana&amp;#39; \
    -H &amp;#39;X-Loki-Actor-Path: joe&amp;#39;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The query that this request invokes ends up in the sub-queue &lt;code&gt;joe&lt;/code&gt; of the
tenant queue &lt;code&gt;grafana&lt;/code&gt;. Another user can use their own name in the actor path
header to enqueue their queries to their own sub-queue.&lt;/p&gt;
&lt;p&gt;Since the scheduler chooses the next task for a tenant in a round-robin manner,
both actors (in our case human users) get their 50% share when the scheduler
dequeues a sub-query to send to the querier.&lt;/p&gt;
&lt;p&gt;With N actors, each actor gets 1/Nth of their share. In our example with two
users, even when there are sub-queries in the local queue of the tenant, the
local queue gets 1/3 and each sub-queue gets 1/3 of their share.&lt;/p&gt;
&lt;p&gt;As the explained implementation and the header name already suggest, it is
possible to enqueue queries several levels deep. To do so, you can construct a
path to the sub-queue using the &lt;code&gt;|&lt;/code&gt; delimiter in the header value, as shown in
the following examples.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;curl -s http://localhost:3100/loki/api/v1/query_range?xxx \
    -H &amp;#39;X-Scope-OrgID: grafana&amp;#39; \
    -H &amp;#39;X-Loki-Actor-Path: users|joe&amp;#39;

curl -s http://localhost:3100/loki/api/v1/query_range?xxx \
    -H &amp;#39;X-Scope-OrgID: grafana&amp;#39; \
    -H &amp;#39;X-Loki-Actor-Path: apps|logcli&amp;#39;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;There is a limit to how deep a path and thus the queue tree can be. This is
controlled by the Loki &lt;code&gt;-query-scheduler.max-queue-hierarchy-levels&lt;/code&gt; CLI argument
or its respective YAML configuration block:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;query_scheduler:
  max_queue_hierarchy_levels: 2 # defaults to 3&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;It is advised to keep the levels at a reasonable level (ideally 1 to 3 levels),
both for performance reasons as well as for the understanding of how query
fairness is ensured across all sub-queues.&lt;/p&gt;
&lt;h2 id=&#34;enforcing-headers&#34;&gt;Enforcing headers&lt;/h2&gt;
&lt;p&gt;In the examples above the client that invoked the query directly against Loki also provided the
HTTP header that controls where in the queue tree the sub-queries are enqueued. However, as an operator,
you would usually want to avoid this scenario and control yourself where the header is set.&lt;/p&gt;
&lt;p&gt;When using Grafana as the Loki user interface, you can, for example, create multiple data sources
with the same tenant, but with a different additional HTTP header
&lt;code&gt;X-Loki-Actor-Path&lt;/code&gt; and restrict which Grafana user can use which data source.&lt;/p&gt;
&lt;p&gt;Alternatively, if you have a proxy for authentication in front of Loki, you can
pass the (hashed) user from the authentication as downstream header to Loki.&lt;/p&gt;
]]></content><description>&lt;h1 id="ensure-query-fairness-within-tenants-using-actors">Ensure query fairness within tenants using actors&lt;/h1>
&lt;p>Loki uses &lt;a href="../shuffle-sharding/">shuffle sharding&lt;/a>
to minimize impact across tenants in case of querier failures or misbehaving
neighboring tenants.&lt;/p>
&lt;p>When there are potentially a lot of different actors using the same tenant to
query logs, such as users accessing Loki from Grafana or via LogCLI or other
applications using the HTTP API, it can lead to contention between queries of
different users, because they all share the same resources for a tenant.&lt;/p></description></item><item><title>Isolate tenant workflows using shuffle sharding</title><link>https://grafana.com/docs/loki/v3.7.x/operations/shuffle-sharding/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/shuffle-sharding/</guid><content><![CDATA[&lt;h1 id=&#34;isolate-tenant-workflows-using-shuffle-sharding&#34;&gt;Isolate tenant workflows using shuffle sharding&lt;/h1&gt;
&lt;p&gt;Shuffle sharding is a resource-management technique used to isolate tenant workloads from other tenant workloads, to give each tenant more of a single-tenant experience when running in a shared cluster.
This technique is explained by AWS in their article &lt;a href=&#34;https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Workload isolation using shuffle-sharding&lt;/a&gt;.
A reference implementation has been shown in the &lt;a href=&#34;https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Route53 Infima library&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;the-issues-that-shuffle-sharding-mitigates&#34;&gt;The issues that shuffle sharding mitigates&lt;/h2&gt;
&lt;p&gt;Shuffle sharding can be configured for the query path.&lt;/p&gt;
&lt;p&gt;The query path is sharded by default, and the default does not use shuffle sharding.
Each tenant’s query is sharded across all queriers, so the workload uses all querier instances.&lt;/p&gt;
&lt;p&gt;In a multi-tenant cluster, sharding across all instances of a component may exhibit these issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Any outage of a component instance affects all tenants&lt;/li&gt;
&lt;li&gt;A misbehaving tenant affects all other tenants&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An individual query may create issues for all tenants.
A single tenant or a group of tenants may issue an expensive query:
one that causes a querier component to hit an out-of-memory error,
or one that causes a querier component to crash.
Once the error occurs,
the tenant or tenants issuing the error-causing query will be reassigned
to other running queriers(remember all tenants can use all available queriers),
This, in turn, may affect the queriers that have been reassigned.&lt;/p&gt;
&lt;h2 id=&#34;how-shuffle-sharding-works&#34;&gt;How shuffle sharding works&lt;/h2&gt;
&lt;p&gt;The idea of shuffle sharding is to assign each tenant to a shard composed by a subset of the Loki queriers, aiming to minimize the overlapping instances between distinct tenants.&lt;/p&gt;
&lt;p&gt;A misbehaving tenant will affect only its shard&amp;rsquo;s queriers. Due to the low overlap of queriers among tenants, only a small subset of tenants will be affected by the misbehaving tenant.
Shuffle sharding requires no more resources than the default sharding strategy.&lt;/p&gt;
&lt;p&gt;Shuffle sharding does not fix all issues.
If a tenant repeatedly sends a problematic query, the crashed querier
will be disconnected from the query-frontend, and a new querier
will be immediately assigned to the tenant’s shard.
This invalidates the positive effects of shuffle sharding.
In this case,
configuring a delay between when a querier disconnects because of a crash,
and when the crashed querier is actually removed from the tenant’s shard
and another healthy querier is added as a replacement improves the situation.
A delay of 1 minute may be a reasonable value in
the query-frontend with configuration parameter
&lt;code&gt;-query-frontend.querier-forget-delay=1m&lt;/code&gt;, and in the query-scheduler with configuration parameter
&lt;code&gt;-query-scheduler.querier-forget-delay=1m&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;low-probability-of-overlapping-instances&#34;&gt;Low probability of overlapping instances&lt;/h3&gt;
&lt;p&gt;If an example Loki cluster runs 50 queriers and assigns each tenant 4 out of 50 queriers, shuffling instances between each tenant, there are 230K possible combinations.&lt;/p&gt;
&lt;p&gt;Statistically, randomly picking two distinct tenants, there is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a 71% chance that they will not share any instance&lt;/li&gt;
&lt;li&gt;a 26% chance that they will share only 1 instance&lt;/li&gt;
&lt;li&gt;a 2.7% chance that they will share 2 instances&lt;/li&gt;
&lt;li&gt;a 0.08% chance that they will share 3 instances&lt;/li&gt;
&lt;li&gt;only a 0.0004% chance that their instances will fully overlap&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;./shuffle-sharding-probability.png&#34;
  alt=&#34;overlapping instances probability&#34;/&gt;&lt;/p&gt;
&lt;h2 id=&#34;configuration&#34;&gt;Configuration&lt;/h2&gt;
&lt;p&gt;Enable shuffle sharding by setting &lt;code&gt;-frontend.max-queriers-per-tenant&lt;/code&gt; to a value higher than 0 and lower than the number of available queriers.
The value of the per-tenant configuration
&lt;code&gt;max_queriers_per_tenant&lt;/code&gt; sets the quantity of allocated queriers.
This option is only available when using the query-frontend, with or without a scheduler.&lt;/p&gt;
&lt;p&gt;The per-tenant configuration parameter
&lt;code&gt;max_query_parallelism&lt;/code&gt; describes how many sub queries, after query splitting and query sharding, can be scheduled to run at the same time for each request of any tenant.&lt;/p&gt;
&lt;p&gt;Configuration parameter
&lt;code&gt;querier.concurrency&lt;/code&gt; controls the quantity of worker threads (goroutines) per single querier.&lt;/p&gt;
&lt;p&gt;The maximum number of queriers can be overridden on a per-tenant basis in the limits overrides configuration by &lt;code&gt;max_queriers_per_tenant&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;shuffle-sharding-metrics&#34;&gt;Shuffle sharding metrics&lt;/h2&gt;
&lt;p&gt;These metrics reveal information relevant to shuffle sharding:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;the overall query-scheduler queue duration,  &lt;code&gt;loki_query_scheduler_queue_duration_seconds_*&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the query-scheduler queue length per tenant, &lt;code&gt;loki_query_scheduler_queue_length&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the query-scheduler queue duration per tenant can be found with this query:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;max_over_time({cluster=&amp;#34;$cluster&amp;#34;,container=&amp;#34;query-frontend&amp;#34;, namespace=&amp;#34;$namespace&amp;#34;} |= &amp;#34;metrics.go&amp;#34; |logfmt | unwrap duration(queue_time) | __error__=&amp;#34;&amp;#34; [5m]) by (org_id)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Too many spikes in any of these metrics may imply:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A particular tenant is trying to use more query resources than they were allocated.&lt;/li&gt;
&lt;li&gt;That tenant may need an increase in the value of &lt;code&gt;max_queriers_per_tenant&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Loki instances may be under provisioned.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A useful query checks how many queriers are being used by each tenant:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;count by (org_id) (sum by (org_id, pod) (count_over_time({job=&amp;#34;$namespace/querier&amp;#34;, cluster=&amp;#34;$cluster&amp;#34;} |= &amp;#34;metrics.go&amp;#34; | logfmt [$__interval])))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="isolate-tenant-workflows-using-shuffle-sharding">Isolate tenant workflows using shuffle sharding&lt;/h1>
&lt;p>Shuffle sharding is a resource-management technique used to isolate tenant workloads from other tenant workloads, to give each tenant more of a single-tenant experience when running in a shared cluster.
This technique is explained by AWS in their article &lt;a href="https://aws.amazon.com/builders-library/workload-isolation-using-shuffle-sharding/" target="_blank" rel="noopener noreferrer">Workload isolation using shuffle-sharding&lt;/a>.
A reference implementation has been shown in the &lt;a href="https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java" target="_blank" rel="noopener noreferrer">Route53 Infima library&lt;/a>.&lt;/p></description></item><item><title>Loki meta-monitoring</title><link>https://grafana.com/docs/loki/v3.7.x/operations/meta-monitoring/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/meta-monitoring/</guid><content><![CDATA[&lt;h1 id=&#34;loki-meta-monitoring&#34;&gt;Loki meta-monitoring&lt;/h1&gt;
&lt;p&gt;As part of your Loki implementation, you will also want to monitor your Loki cluster.&lt;/p&gt;
&lt;p&gt;As a best practice, you should collect data about Loki in a separate instance of Loki, Prometheus, and Grafana. For example, send your Loki cluster data to a &lt;a href=&#34;/products/cloud/&#34;&gt;Grafana Cloud account&lt;/a&gt;. This will let you troubleshoot a broken Loki cluster from a working one.&lt;/p&gt;
&lt;p&gt;Loki exposes the following observability data about itself:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt;: Loki provides a &lt;code&gt;/metrics&lt;/code&gt; endpoint that sends information about Loki in Prometheus format. These metrics provide aggregated metrics of the health of your Loki cluster, allowing you to observe query response times, etc. Each Loki component sends its own metrics, allowing for fine-grained monitoring of the health of your Loki cluster. For more information about the metrics Loki exposes, refer to &lt;a href=&#34;#loki-metrics&#34;&gt;metrics&lt;/a&gt;. It is important to keep &lt;a href=&#34;#metrics-cardinality&#34;&gt;metrics cardinality&lt;/a&gt; in mind when running a large distributed Loki cluster.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Logs&lt;/strong&gt;: Loki emits a detailed log line &lt;code&gt;metrics.go&lt;/code&gt; for every query, which shows query duration, number of lines returned, query throughput, the specific LogQL that was executed, chunks searched, and much more. You can use these log lines to improve and optimize your query performance. You can also collect pod logs from your Loki components to monitor and drill down into specific issues.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;monitoring-loki&#34;&gt;Monitoring Loki&lt;/h2&gt;
&lt;p&gt;There are three primary components to monitoring Loki:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/grafana/k8s-monitoring-helm/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Kubernetes Monitoring Helm&lt;/a&gt;: The Kubernetes Monitoring Helm chart provides a comprehensive monitoring solution for Kubernetes clusters. It also provides direct integrations for monitoring the full LGTM (Loki, Grafana, Tempo and Mimir) stack. To learn how to deploy the Kubernetes Monitoring Helm chart, refer to 
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/meta-monitoring/deploy/&#34;&gt;deploy meta-monitoring&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;/products/cloud/&#34;&gt;Grafana Cloud account&lt;/a&gt; or a separate LGTM stack: The data collected from the Loki cluster can be sent to a Grafana Cloud account or a separate LGTM stack. We recommend using Grafana Cloud since it is Grafana Lab&amp;rsquo;s responsibility to maintain the availability and performance of the Grafana Cloud services.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://github.com/grafana/loki/tree/main/production/loki-mixin-compiled&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;The Loki mixin&lt;/a&gt;: is an opinionated set of dashboards, alerts, and recording rules to monitor your Loki cluster. The mixin provides a comprehensive package for monitoring Loki in production. You can install the mixin into a Grafana instance. To install the Loki mixin, follow 
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/meta-monitoring/mixins/&#34;&gt;these directions&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You should also plan separately for infrastructure-level monitoring, to monitor the capacity or throughput of your storage provider, for example, or your networking layer.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://min.io/docs/minio/linux/operations/monitoring/collect-minio-metrics-using-prometheus.html&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;MinIO&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;/docs/grafana-cloud/monitor-infrastructure/kubernetes-monitoring/&#34;&gt;Kubernetes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Kubernetes Monitoring Helm chart Grafana Labs uses to monitor Loki also provides these features out of the box with Kubernetes monitoring enabled by default. You can choose which of these features to enable or disable based on how much data you want to collect and your meta-monitoring budget.&lt;/p&gt;
&lt;h2 id=&#34;loki-metrics&#34;&gt;Loki Metrics&lt;/h2&gt;
&lt;p&gt;As Loki is a 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/components/&#34;&gt;distributed system&lt;/a&gt;, each component exports its own metrics. The &lt;code&gt;/metrics&lt;/code&gt; endpoint exposes hundreds of different metrics. You can find a sampling of the metrics exposed by Loki and their descriptions, in the sections below.&lt;/p&gt;
&lt;p&gt;You can find a complete list of the exposed metrics by checking the &lt;code&gt;/metrics&lt;/code&gt; endpoint.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;http://&amp;lt;host&amp;gt;:&amp;lt;http_listen_port&amp;gt;/metrics&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;p&gt;&lt;a href=&#34;http://localhost:3100/metrics&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;http://localhost:3100/metrics&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Both Grafana Loki and Alloy expose a &lt;code&gt;/metrics&lt;/code&gt; endpoint that expose Prometheus metrics (the default port is &lt;code&gt;3100&lt;/code&gt; for Loki and &lt;code&gt;12345&lt;/code&gt; for Alloy). To store these metrics, you can use Prometheus or Mimir.&lt;/p&gt;
&lt;p&gt;All components of Loki expose the following metrics:&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Metric Name&lt;/th&gt;
              &lt;th&gt;Metric Type&lt;/th&gt;
              &lt;th&gt;Description&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_internal_log_messages_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Total number of log messages created by Loki itself.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_request_duration_seconds&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Number of received HTTP requests.&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;p&gt;For a deeper look at which metrics are most important for detecting negative trends and abnormal behavior, refer to 
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/meta-monitoring/metrics/&#34;&gt;Key metrics for monitoring Loki&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Note that most of the metrics are counters and should continuously increase during normal operations.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Your app emits a log line to a file that is tracked by Alloy.&lt;/li&gt;
&lt;li&gt;Alloy reads the new line and increases its counters.&lt;/li&gt;
&lt;li&gt;Alloy forwards the log line to a Loki distributor, where the received
counters should increase.&lt;/li&gt;
&lt;li&gt;The Loki distributor forwards the log line to a Loki ingester, where the
request duration counter should increase.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If Alloy uses any pipelines with metrics stages, those metrics will also be
exposed by Alloy at its &lt;code&gt;/metrics&lt;/code&gt; endpoint.&lt;/p&gt;
&lt;h3 id=&#34;metrics-cardinality&#34;&gt;Metrics cardinality&lt;/h3&gt;
&lt;p&gt;Some metrics carry labels that increase cardinality in large environments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Client-side:&lt;/strong&gt; Alloy and Promtail emit per-file metrics using a &lt;code&gt;filename&lt;/code&gt; label. In environments with many tracked files, this can produce a large number of unique time series.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Server-side:&lt;/strong&gt; Loki metrics such as &lt;code&gt;loki_discarded_samples_total&lt;/code&gt; and &lt;code&gt;loki_ingester_chunks_stored_total&lt;/code&gt; include a &lt;code&gt;tenant&lt;/code&gt; label. Multi-tenant deployments with many tenants see proportional cardinality growth.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Kubernetes Monitoring Helm chart includes metric relabeling rules to manage cardinality. If you auto-scale Loki components, be aware that each new pod adds its own set of per-instance time series.&lt;/p&gt;
&lt;h2 id=&#34;example-loki-log-line-metricsgo&#34;&gt;Example Loki log line: metrics.go&lt;/h2&gt;
&lt;p&gt;Loki emits a &lt;code&gt;metrics.go&lt;/code&gt; log line from the Querier, Query frontend and Ruler components, which lets you inspect query and recording rule performance. This is an example of a detailed log line &lt;code&gt;metrics.go&lt;/code&gt; for a query.&lt;/p&gt;
&lt;p&gt;Example log&lt;/p&gt;
&lt;p&gt;&lt;code&gt;level=info ts=2024-03-11T13:44:10.322919331Z caller=metrics.go:143 component=frontend org_id=mycompany latency=fast query=&amp;quot;sum(count_over_time({kind=\&amp;quot;auditing\&amp;quot;} | json | user_userId =`` [1m]))&amp;quot; query_type=metric range_type=range length=10m0s start_delta=10m10.322900424s end_delta=10.322900663s step=1s duration=47.61044ms status=200 limit=100 returned_lines=0 throughput=9.8MB total_bytes=467kB total_entries=1 queue_time=0s subqueries=2 cache_chunk_req=1 cache_chunk_hit=1 cache_chunk_bytes_stored=0 cache_chunk_bytes_fetched=14394 cache_index_req=19 cache_index_hit=19 cache_result_req=1 cache_result_hit=1&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;You can use the query-frontend &lt;code&gt;metrics.go&lt;/code&gt; lines to understand a query’s overall performance. The &lt;code&gt;metrics.go&lt;/code&gt; line output by the Queriers contains the same information as the Query frontend but is often more helpful in understanding and troubleshooting query performance. This is largely because it can tell you how the querier spent its time executing the subquery. Here are the most useful stats:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;total_bytes&lt;/strong&gt;: how many total bytes the query processed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;duration&lt;/strong&gt;: how long the query took to execute&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;throughput&lt;/strong&gt;: total_bytes/duration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;total_lines&lt;/strong&gt;: how many total lines the query processed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;length&lt;/strong&gt;: how much time the query was executed over&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;post_filter_lines&lt;/strong&gt;: how many lines matched the filters in the query&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;cache_chunk_req&lt;/strong&gt;: total number of chunks fetched for the query (the cache will be asked for every chunk so this is equivalent to the total chunks requested)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;splits&lt;/strong&gt;: how many pieces the query was split into based on time and split_queries_by_interval&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;shards&lt;/strong&gt;: how many shards the query was split into&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For more information, refer to the blog post &lt;a href=&#34;/blog/2023/12/28/the-concise-guide-to-loki-how-to-get-the-most-out-of-your-query-performance/&#34;&gt;The concise guide to Loki: How to get the most out of your query performance&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;configure-logging-levels&#34;&gt;Configure Logging Levels&lt;/h3&gt;
&lt;p&gt;To change the configuration for Loki logging levels, update log_level configuration parameter in your &lt;code&gt;config.yaml&lt;/code&gt; file.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;# Only log messages with the given severity or above. Valid levels: [debug,
# info, warn, error]
# CLI flag: -log.level
[log_level: &amp;lt;string&amp;gt; | default = &amp;#34;info&amp;#34;]&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="loki-meta-monitoring">Loki meta-monitoring&lt;/h1>
&lt;p>As part of your Loki implementation, you will also want to monitor your Loki cluster.&lt;/p>
&lt;p>As a best practice, you should collect data about Loki in a separate instance of Loki, Prometheus, and Grafana. For example, send your Loki cluster data to a &lt;a href="/products/cloud/">Grafana Cloud account&lt;/a>. This will let you troubleshoot a broken Loki cluster from a working one.&lt;/p></description></item><item><title>Manage and debug errors</title><link>https://grafana.com/docs/loki/v3.7.x/operations/troubleshooting/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/troubleshooting/</guid><content><![CDATA[&lt;h1 id=&#34;manage-and-debug-errors&#34;&gt;Manage and debug errors&lt;/h1&gt;
&lt;p&gt;The section provides information to help you troubleshoot issues with Grafana Loki.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-operations/&#34;&gt;Troubleshoot operations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-ingest/&#34;&gt;Troubleshoot ingestion (write)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-drilldown/&#34;&gt;Troubleshoot Logs Drilldown&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-query/&#34;&gt;Troubleshoot querying (read)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="manage-and-debug-errors">Manage and debug errors&lt;/h1>
&lt;p>The section provides information to help you troubleshoot issues with Grafana Loki.&lt;/p>
&lt;ul>
&lt;li>
&lt;a href="/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-operations/">Troubleshoot operations&lt;/a>&lt;/li>
&lt;li>
&lt;a href="/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-ingest/">Troubleshoot ingestion (write)&lt;/a>&lt;/li>
&lt;li>
&lt;a href="/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-drilldown/">Troubleshoot Logs Drilldown&lt;/a>&lt;/li>
&lt;li>
&lt;a href="/docs/loki/v3.7.x/operations/troubleshooting/troubleshoot-query/">Troubleshoot querying (read)&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Manage authentication</title><link>https://grafana.com/docs/loki/v3.7.x/operations/authentication/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/authentication/</guid><content><![CDATA[&lt;h1 id=&#34;manage-authentication&#34;&gt;Manage authentication&lt;/h1&gt;
&lt;p&gt;Grafana Loki does not come with any included authentication layer. You must run an authenticating reverse proxy in front of your services.&lt;/p&gt;
&lt;p&gt;The simple scalable and microservices 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/deployment-modes/&#34;&gt;deployment modes&lt;/a&gt; require a reverse proxy to be deployed in front of Loki, to direct client API requests to the various components.&lt;/p&gt;
&lt;p&gt;By default the Loki Helm chart includes a default reverse proxy configuration, using an nginx container to handle routing traffic and authorization.&lt;/p&gt;
&lt;p&gt;A list of open-source reverse proxies you can use:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.haproxy.org/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;HAProxy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://docs.nginx.com/nginx/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;nginx&lt;/a&gt; using their &lt;a href=&#34;https://docs.nginx.com/nginx/admin-guide/security-controls/configuring-http-basic-authentication/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;guide on restricting access with HTTP basic authentication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://oauth2-proxy.github.io/oauth2-proxy/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;OAuth2 proxy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.pomerium.com/docs&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Pomerium&lt;/a&gt;, which has a &lt;a href=&#34;https://www.pomerium.com/docs/guides/grafana&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;guide for securing Grafana&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;When using Loki in multi-tenant mode, Loki requires the HTTP header
&lt;code&gt;X-Scope-OrgID&lt;/code&gt; to be set to a string identifying the tenant; the responsibility
of populating this value should be handled by the authenticating reverse proxy.
For more information, read the 
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/multi-tenancy/&#34;&gt;multi-tenancy&lt;/a&gt; documentation.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;For information on configuring authentication for your log shipping agent, see the &lt;a href=&#34;/docs/alloy/latest/&#34;&gt;Grafana Alloy documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;enable-basic-authentication-for-loki-using-nginx&#34;&gt;Enable basic authentication for Loki using nginx&lt;/h2&gt;
&lt;p&gt;This section describes the process of enabling basic authentication for Loki using &lt;a href=&#34;https://docs.nginx.com/nginx/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;nginx&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;A running Loki instance&lt;/li&gt;
&lt;li&gt;A running nginx instance&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;configure-nginx&#34;&gt;Configure nginx&lt;/h3&gt;
&lt;p&gt;You must create a new nginx configuration file for the Loki instance.&lt;/p&gt;
&lt;p&gt;This example assumes the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;nginx is running in &lt;code&gt;/opt/homebrew&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Loki is running on port 3100 on the local machine&lt;/li&gt;
&lt;li&gt;Your Loki tenant id is &lt;code&gt;fake&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The configuration file is named &lt;code&gt;/opt/homebrew/etc/nginx/loki.conf&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you used different configuration parameters for Loki, adjust the examples to match your configuration.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;loki.conf&lt;/code&gt; configuration:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;conf&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-conf&#34;&gt;upstream loki {
  server 127.0.0.1:3100;
  keepalive 15;
}

server {
  listen 80;
  server_name loki.localhost;

  auth_basic &amp;#34;loki auth&amp;#34;;
  auth_basic_user_file /opt/homebrew/etc/nginx/passwords;

  location / {
    proxy_read_timeout 1800s;
    proxy_connect_timeout 1600s;
    proxy_pass http://loki;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection &amp;#34;Keep-Alive&amp;#34;;
    proxy_set_header Proxy-Connection &amp;#34;Keep-Alive&amp;#34;;
    proxy_redirect off;
  }

  location /ready {
    proxy_pass http://loki;
    proxy_http_version 1.1;
    proxy_set_header Connection &amp;#34;Keep-Alive&amp;#34;;
    proxy_set_header Proxy-Connection &amp;#34;Keep-Alive&amp;#34;;
    proxy_redirect off;
    auth_basic &amp;#34;off&amp;#34;;
  }
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This configuration must be included in your main nginx configuration, for example, by including it in &lt;code&gt;nginx.conf&lt;/code&gt; like:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;include /opt/homebrew/etc/nginx/loki.conf;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Restart the nginx server to ensure all configuration changes are updated.&lt;/p&gt;
&lt;h3 id=&#34;validate-your-nginx-configuration&#34;&gt;Validate your nginx configuration&lt;/h3&gt;
&lt;p&gt;To validate the nginx configuration for Loki, you can send a &lt;code&gt;curl&lt;/code&gt; request to two endpoints:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;/ready&lt;/code&gt; endpoint, which is not protected by a basic authentication mechanism.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;curl&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-curl&#34;&gt;% curl -i http://loki.localhost/ready

HTTP/1.1 200 OK
Server: nginx/1.29.2
Date: Thu, 16 Oct 2025 14:28:31 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 6
Connection: keep-alive
X-Content-Type-Options: nosniff

ready&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;/&lt;/code&gt; endpoint, which is protected by a basic authentication mechanism.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;curl&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-curl&#34;&gt;curl -i http://loki.localhost/

HTTP/1.1 401 Unauthorized
Server: nginx/1.29.2
Date: Thu, 16 Oct 2025 14:32:43 GMT
Content-Type: text/html
Content-Length: 179
Connection: keep-alive
WWW-Authenticate: Basic realm=&amp;#34;loki auth&amp;#34;

&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;&amp;lt;title&amp;gt;401 Authorization Required&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;
&amp;lt;center&amp;gt;&amp;lt;h1&amp;gt;401 Authorization Required&amp;lt;/h1&amp;gt;&amp;lt;/center&amp;gt;
&amp;lt;hr&amp;gt;&amp;lt;center&amp;gt;nginx/1.29.2&amp;lt;/center&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;update-passwords&#34;&gt;Update passwords&lt;/h3&gt;
&lt;p&gt;The password file can be seeded using whatever mechanism you may use for other web services.&lt;/p&gt;
&lt;p&gt;In this example, &lt;code&gt;htpasswd&lt;/code&gt; is utilized:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;% htpasswd -c /opt/homebrew/etc/nginx/passwords loki123

New password:
Re-type new password:
Adding password for user loki123&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Restart the nginx server to ensure all configuration changes are updated.&lt;/p&gt;
&lt;h3 id=&#34;validate-passwords&#34;&gt;Validate passwords&lt;/h3&gt;
&lt;p&gt;Enter your password into a temporary file, such as:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;% vi lokipw&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then, store it as an environment variable::&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;% pass=$(cat lokipw)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;You can validate basic authentication is then working by issuing a curl command to the protected resource:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;curl&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-curl&#34;&gt;curl -i -u loki123:$pass -H &amp;#34;X-Scope-OrgID:fake&amp;#34; &amp;#34;http://loki.localhost/loki/api/v1/labels&amp;#34;

HTTP/1.1 200 OK
Server: nginx/1.29.2
Date: Thu, 16 Oct 2025 14:46:09 GMT
Content-Type: application/json; charset=UTF-8
Content-Length: 21
Connection: keep-alive

{&amp;#34;status&amp;#34;:&amp;#34;success&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="manage-authentication">Manage authentication&lt;/h1>
&lt;p>Grafana Loki does not come with any included authentication layer. You must run an authenticating reverse proxy in front of your services.&lt;/p>
&lt;p>The simple scalable and microservices
&lt;a href="/docs/loki/v3.7.x/get-started/deployment-modes/">deployment modes&lt;/a> require a reverse proxy to be deployed in front of Loki, to direct client API requests to the various components.&lt;/p></description></item><item><title>Manage bloom filter building and querying (Experimental)</title><link>https://grafana.com/docs/loki/v3.7.x/operations/bloom-filters/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/bloom-filters/</guid><content><![CDATA[&lt;h1 id=&#34;manage-bloom-filter-building-and-querying-experimental&#34;&gt;Manage bloom filter building and querying (Experimental)&lt;/h1&gt;


&lt;div class=&#34;admonition admonition-warning&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Warning&lt;/p&gt;&lt;p&gt;In Loki and Grafana Enterprise Logs (GEL), Query acceleration using blooms is an &lt;a href=&#34;/docs/release-life-cycle/&#34;&gt;experimental feature&lt;/a&gt;. Engineering and on-call support is not available. No SLA is provided. Note that this feature is intended for users who are ingesting more than 75TB of logs a month, as it is designed to accelerate queries against large volumes of logs.&lt;/p&gt;
&lt;p&gt;In Grafana Cloud, Query acceleration using bloom filters is enabled as a &lt;a href=&#34;/docs/release-life-cycle/&#34;&gt;public preview&lt;/a&gt; for select large-scale customers that are ingesting more that 75TB of logs a month. Limited support and no SLA are provided.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;Loki leverages &lt;a href=&#34;https://en.wikipedia.org/wiki/Bloom_filter&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;bloom filters&lt;/a&gt; to speed up queries by reducing the amount of data Loki needs to load from the store and iterate through.
Loki is often used to run &amp;ldquo;needle in a haystack&amp;rdquo; queries; these are queries where a large number of log lines are searched, but only a few log lines match the query.
Some common use cases are searching all logs tied to a specific trace ID or customer ID.&lt;/p&gt;
&lt;p&gt;An example of such queries would be looking for a trace ID on a whole cluster for the past 24 hours:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;logql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-logql&#34;&gt;{cluster=&amp;#34;prod&amp;#34;} | traceID=&amp;#34;3c0e3dcd33e7&amp;#34;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Without accelerated filtering, Loki downloads all the chunks for all the streams matching &lt;code&gt;{cluster=&amp;quot;prod&amp;quot;}&lt;/code&gt; for the last 24 hours and iterates through each log line in the chunks, checking if the 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/labels/structured-metadata/&#34;&gt;structured metadata&lt;/a&gt; key &lt;code&gt;traceID&lt;/code&gt; with value &lt;code&gt;3c0e3dcd33e7&lt;/code&gt; is present.&lt;/p&gt;
&lt;p&gt;With accelerated filtering, Loki is able to skip most of the chunks and only process the ones where we have a statistical confidence that the structured metadata pair might be present.&lt;/p&gt;
&lt;p&gt;To learn how to write queries to use bloom filters, refer to 
    &lt;a href=&#34;/docs/loki/v3.7.x/query/query_acceleration/&#34;&gt;Query acceleration&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;enable-bloom-filters&#34;&gt;Enable bloom filters&lt;/h2&gt;


&lt;div class=&#34;admonition admonition-warning&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Warning&lt;/p&gt;&lt;p&gt;Building and querying bloom filters are by design not supported in single binary deployment.
It can be used with Simple Scalable deployment (SSD), but it is recommended to run bloom components only in fully distributed microservice mode.
The reason is that bloom filters also come with a relatively high cost for both building and querying the bloom filters that only pays off at large scale deployments.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;To start building and using blooms you need to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Deploy the &lt;a href=&#34;#bloom-planner-and-builder&#34;&gt;Bloom Planner and Builder&lt;/a&gt; components (as 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/deployment-modes/#microservices-mode&#34;&gt;microservices&lt;/a&gt; or via the 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/deployment-modes/#simple-scalable&#34;&gt;SSD&lt;/a&gt; &lt;code&gt;backend&lt;/code&gt; target) and enable the components in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#bloom_build&#34;&gt;Bloom Build config&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Deploy the &lt;a href=&#34;#bloom-gateway&#34;&gt;Bloom Gateway&lt;/a&gt; component (as a 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/deployment-modes/#microservices-mode&#34;&gt;microservice&lt;/a&gt; or via the 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/deployment-modes/#simple-scalable&#34;&gt;SSD&lt;/a&gt; &lt;code&gt;backend&lt;/code&gt; target) and enable the component in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#bloom_gateway&#34;&gt;Bloom Gateway config&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Enable blooms building and filtering for each tenant individually, or for all of them by default.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;# Configuration block for the bloom creation.
bloom_build:
  enabled: true
  planner:
    planning_interval: 6h
  builder:
    planner_address: bloom-planner.&amp;lt;namespace&amp;gt;.svc.cluster.local:9095

# Configuration block for bloom filtering.
bloom_gateway:
  enabled: true
  client:
    addresses: dnssrvnoa&amp;#43;_bloom-gateway-grpc._tcp.bloom-gateway-headless.&amp;lt;namespace&amp;gt;.svc.cluster.local

# Enable blooms creation and filtering for all tenants by default
# or do it on a per-tenant basis.
limits_config:
  bloom_creation_enabled: true
  bloom_split_series_keyspace_by: 1024
  bloom_gateway_enable_filtering: true&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;For more configuration options refer to the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#bloom_gateway&#34;&gt;Bloom Gateway&lt;/a&gt;, 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#bloom_build&#34;&gt;Bloom Build&lt;/a&gt; and 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#limits_config&#34;&gt;per tenant-limits&lt;/a&gt; configuration docs.
We strongly recommend reading the whole documentation for this experimental feature before using it.&lt;/p&gt;
&lt;h2 id=&#34;bloom-planner-and-builder&#34;&gt;Bloom Planner and Builder&lt;/h2&gt;
&lt;p&gt;Building bloom filters from the chunks in the object storage is done by two components: the Bloom Planner and the Bloom
Builder, where the planner creates tasks for bloom building, and sends the tasks to the builders to process and upload the resulting blocks.
Bloom filters are grouped in bloom blocks spanning multiple streams (also known as series) and chunks from a given day.
To learn more about how blocks and metadata files are organized, refer to the &lt;a href=&#34;#building-blooms&#34;&gt;Building blooms&lt;/a&gt; section below.&lt;/p&gt;
&lt;p&gt;The Bloom Planner runs as a single instance and calculates the gaps in fingerprint ranges for a certain time period for a tenant for which bloom filters need to be built.
It dispatches these tasks to the available builders. The planner also applies the &lt;a href=&#34;#retention&#34;&gt;blooms retention&lt;/a&gt;.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-warning&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Warning&lt;/p&gt;&lt;p&gt;Do not run more than one instance of the Bloom Planner.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;The Bloom Builder is a stateless horizontally scalable component and can be scaled independently of the planner to fulfill the processing demand of the created tasks.&lt;/p&gt;
&lt;p&gt;You can find all the configuration options for these components in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#bloom_build&#34;&gt;Configure section for the Bloom Builder&lt;/a&gt;.
Refer to the &lt;a href=&#34;#enable-bloom-filters&#34;&gt;Enable bloom filters&lt;/a&gt; section above for a configuration snippet enabling this feature.&lt;/p&gt;
&lt;h3 id=&#34;retention&#34;&gt;Retention&lt;/h3&gt;
&lt;p&gt;The Bloom Planner applies bloom block retention on object storage. Retention is disabled by default.
When enabled, retention is applied to all tenants. The retention for each tenant is the longest of its 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#limits_config&#34;&gt;configured&lt;/a&gt; general retention (&lt;code&gt;retention_period&lt;/code&gt;) and the streams retention (&lt;code&gt;retention_stream&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;For example, in the following example, tenant A has a bloom retention of 30 days, and tenant B a bloom retention of 40 days for the &lt;code&gt;{namespace=&amp;quot;prod&amp;quot;}&lt;/code&gt; stream.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
    &amp;#34;A&amp;#34;:
        retention_period: 30d
    &amp;#34;B&amp;#34;:
        retention_period: 30d
        retention_stream:
            - selector: &amp;#39;{namespace=&amp;#34;prod&amp;#34;}&amp;#39;
              priority: 1
              period: 40d&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;planner-and-builder-sizing-and-configuration&#34;&gt;Planner and Builder sizing and configuration&lt;/h3&gt;
&lt;p&gt;The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval and puts the created tasks to an internal task queue.
Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of &lt;code&gt;-bloom-build.planner.bloom_split_series_keyspace_by&lt;/code&gt;, the number of tenants, and the log volume of the streams.&lt;/p&gt;
&lt;p&gt;The maximum block size is configured per tenant via &lt;code&gt;-bloom-build.max-block-size&lt;/code&gt;.
The actual block size might exceed this limit given that we append streams blooms to the block until the block is larger than the configured maximum size.
Blocks are created in memory and as soon as they are written to the object store they are freed. Chunks and TSDB files are downloaded from the object store to the file system.
We estimate that builders are able to process 4MB worth of data per second per core.&lt;/p&gt;
&lt;h2 id=&#34;bloom-gateway&#34;&gt;Bloom Gateway&lt;/h2&gt;
&lt;p&gt;Bloom Gateways handle chunks filtering requests from the 
    &lt;a href=&#34;/docs/loki/v3.7.x/get-started/components/#index-gateway&#34;&gt;index gateway&lt;/a&gt;.
The service takes a list of chunks and a filtering expression and matches them against the blooms, filtering out those chunks not matching the given filter expression.&lt;/p&gt;
&lt;p&gt;This component is horizontally scalable and every instance only owns a subset of the stream fingerprint range for which it performs the filtering.
The sharding of the data is performed on the client side using DNS discovery of the server instances and the &lt;a href=&#34;https://arxiv.org/abs/1406.2294&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;jumphash&lt;/a&gt; algorithm for consistent hashing and even distribution of the stream fingerprints across Bloom Gateway instances.&lt;/p&gt;
&lt;p&gt;You can find all the configuration options for this component in the Configure section for the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#bloom_gateway&#34;&gt;Bloom Gateways&lt;/a&gt;.
Refer to the &lt;a href=&#34;#enable-bloom-filters&#34;&gt;Enable bloom filters&lt;/a&gt; section above for a configuration snippet enabling this feature.&lt;/p&gt;
&lt;h3 id=&#34;gateway-sizing-and-configuration&#34;&gt;Gateway sizing and configuration&lt;/h3&gt;
&lt;p&gt;Bloom Gateways use their local file system as a Least Recently Used (LRU) cache for blooms that are downloaded from object storage.
The size of the blooms depend on the ingest volume and number of unique structured metadata key-value pairs, as well as on build settings of the blooms, namely false-positive-rate.
With default settings, bloom filters make up &amp;lt;1% of the raw structured metadata size.&lt;/p&gt;
&lt;p&gt;Since reading blooms depends heavily on disk IOPS, Bloom Gateways should make use of multiple, locally attached SSD disks (NVMe) to increase I/O throughput.
Multiple directories on different disk mounts can be specified using the &lt;code&gt;-bloom.shipper.working-directory&lt;/code&gt; 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#storage_config&#34;&gt;setting&lt;/a&gt; when using a comma separated list of mount points, for example:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;-bloom.shipper.working-directory=&amp;#34;/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3&amp;#34;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Bloom Gateways need to deal with relatively large files: the bloom filter blocks.
Even though the binary format of the bloom blocks allows for reading them into memory in smaller pages, the memory consumption depends on the number of pages that are concurrently loaded into memory for processing.
The product of three settings control the maximum amount of bloom data in memory at any given time: &lt;code&gt;-bloom-gateway.worker-concurrency&lt;/code&gt;, &lt;code&gt;-bloom-gateway.block-query-concurrency&lt;/code&gt;, and &lt;code&gt;-bloom.max-query-page-size&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Example, assuming 4 CPU cores:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;-bloom-gateway.worker-concurrency=4      // 1x NUM_CORES
-bloom-gateway.block-query-concurrency=8 // 2x NUM_CORES
-bloom.max-query-page-size=64MiB

4 x 8 x 64MiB = 2048MiB&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Here, the memory requirement for block processing is 2GiB.
To get the minimum requirements for the Bloom Gateways, you need to double the value.&lt;/p&gt;
&lt;h2 id=&#34;building-blooms&#34;&gt;Building blooms&lt;/h2&gt;
&lt;p&gt;Bloom filters are built per stream and aggregated together into block files.
Streams are assigned to blocks by their fingerprint, following the same ordering scheme as Loki’s TSDB and sharding calculation.
This gives a data locality benefit when querying as streams in the same shard are likely to be in the same block.&lt;/p&gt;
&lt;p&gt;In addition to blocks, builders maintain a list of metadata files containing references to bloom blocks and the
TSDB index files they were built from. Gateways and the planner use these metadata files to discover existing blocks.&lt;/p&gt;
&lt;p&gt;Every &lt;code&gt;-bloom-build.planner.interval&lt;/code&gt;, the planner will load the latest TSDB files for all tenants for which bloom building is enabled, and compares the TSDB files with the latest bloom metadata files.
If there are new TSDB files or any of them have changed, the planner will create a task for the streams and chunks referenced by the TSDB file.&lt;/p&gt;
&lt;p&gt;The builder pulls a task from the planner&amp;rsquo;s queue and processes the containing streams and chunks.
For a given stream, the builder will iterate through all the log lines inside its new chunks and build a bloom for the stream.
In case of changes for a previously processed TSDB file, builders will try to reuse blooms from existing blocks instead of building new ones from scratch.
The builder converts structured metadata from each log line of each chunk of a stream and appends the hash of each key, and key-value pair to the bloom, followed by the hashes combined with the chunk identifier.
The first set of hashes allows gateways to skip whole streams, while the latter is for skipping individual chunks.&lt;/p&gt;
&lt;p&gt;For example, given structured metadata &lt;code&gt;foo=bar&lt;/code&gt; in the chunk &lt;code&gt;c6dj8g&lt;/code&gt;, we append to the stream bloom the following hashes: &lt;code&gt;hash(&amp;quot;foo&amp;quot;)&lt;/code&gt;, &lt;code&gt;hash(&amp;quot;foo=bar&amp;quot;)&lt;/code&gt;, &lt;code&gt;hash(&amp;quot;c6dj8g&amp;quot; &#43; &amp;quot;foo&amp;quot;)&lt;/code&gt; and &lt;code&gt;hash(&amp;quot;c6dj8g&amp;quot; &#43; &amp;quot;foo=bar&amp;quot;)&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;query-sharding&#34;&gt;Query sharding&lt;/h2&gt;
&lt;p&gt;Query acceleration does not just happen while processing chunks, but also happens from the query planning phase where the query frontend applies &lt;a href=&#34;https://lokidex.com/posts/tsdb/#sharding&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;query sharding&lt;/a&gt;.
Loki 3.0 introduces a new 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#limits_config&#34;&gt;per-tenant configuration&lt;/a&gt; flag &lt;code&gt;tsdb_sharding_strategy&lt;/code&gt; which defaults to computing shards as in previous versions of Loki by using the index stats to come up with the closest power of two that would optimistically divide the data to process in shards of roughly the same size.
Unfortunately, the amount of data each stream has is often unbalanced with the rest, therefore, some shards end up processing more data than others.&lt;/p&gt;
&lt;p&gt;Query acceleration introduces a new sharding strategy: &lt;code&gt;bounded&lt;/code&gt;, which uses blooms to reduce the chunks to be processed right away during the planning phase in the query frontend, as well as evenly distributes the amount of chunks each sharded query will need to process.&lt;/p&gt;
]]></content><description>&lt;h1 id="manage-bloom-filter-building-and-querying-experimental">Manage bloom filter building and querying (Experimental)&lt;/h1>
&lt;div class="admonition admonition-warning">&lt;blockquote>&lt;p class="title text-uppercase">Warning&lt;/p>&lt;p>In Loki and Grafana Enterprise Logs (GEL), Query acceleration using blooms is an &lt;a href="/docs/release-life-cycle/">experimental feature&lt;/a>. Engineering and on-call support is not available. No SLA is provided. Note that this feature is intended for users who are ingesting more than 75TB of logs a month, as it is designed to accelerate queries against large volumes of logs.&lt;/p></description></item><item><title>Manage large volume log streams with automatic stream sharding</title><link>https://grafana.com/docs/loki/v3.7.x/operations/automatic-stream-sharding/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/automatic-stream-sharding/</guid><content><![CDATA[&lt;h1 id=&#34;manage-large-volume-log-streams-with-automatic-stream-sharding&#34;&gt;Manage large volume log streams with automatic stream sharding&lt;/h1&gt;
&lt;p&gt;Automatic stream sharding can keep streams under a &lt;code&gt;desired_rate&lt;/code&gt; by adding new labels and values to
existing streams. When properly tuned, this can eliminate issues where log producers are rate limited due to the
per-stream rate limit.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To enable automatic stream sharding:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Edit the global 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; of the Loki configuration file:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;limits_config:
  shard_streams:
    enabled: true&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally lower the &lt;code&gt;desired_rate&lt;/code&gt; in bytes if you find that the system is still hitting the &lt;code&gt;per_stream_rate_limit&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;limits_config:
  shard_streams:
    enabled: true
    desired_rate: 2097152 #2MiB&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Optionally enable &lt;code&gt;logging_enabled&lt;/code&gt; for debugging stream sharding.


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;This may affect the ingestion performance of Loki.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;
&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;limits_config:
  shard_streams:
    enabled: true
    logging_enabled: true&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;when-to-use-automatic-stream-sharding&#34;&gt;When to use automatic stream sharding&lt;/h2&gt;
&lt;p&gt;Large log streams present several problems for Loki, namely increased and uneven resource usage on Ingesters and
Distributors. The general recommendation is to explore existing log streams for additional label values that are both
useful for querying and sufficiently low cardinality. There are many cases, however, where no more labels can
be extracted, or cardinality for a label is dangerously large. To protect itself from such volume leading to operational failure, Loki implements per-stream rate limits;
but the result is that some data is lost. The per-stream limit also needs human intervention to change, which is not ideal when log volumes increase and decrease.&lt;/p&gt;
&lt;p&gt;Loki uses automatic stream sharding to avoid rate limiting and large streams for any log stream by ensuring it is close
to a configured &lt;code&gt;desired_rate&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;how-automatic-stream-sharding-works&#34;&gt;How automatic stream sharding works&lt;/h2&gt;
&lt;p&gt;Automatic stream sharding works by adding a new label, &lt;code&gt;__stream_shard__&lt;/code&gt;, to streams and incrementing its value to try
and keep all streams below a configured &lt;code&gt;desired_rate&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The feature adds a new API to Ingesters that reports the size of all existing log streams. Once per second, Distributors
query the API to get a picture of all stream rates in the system. Distributors use the existing stream-rate data and a
configured &lt;code&gt;desired_rate&lt;/code&gt; to determine how many shards a given stream should have. The desired number of new log streams
are created with the label &lt;code&gt;__stream_shard__&lt;/code&gt; and logs are divided evenly among the streams.&lt;/p&gt;
&lt;p&gt;Because automatic stream sharding is reactive and relies on successive calls to Ingesters, the view of current rates is
always somewhat behind. As a result, the actual size of sharded streams will always be higher than the &lt;code&gt;desired_rate&lt;/code&gt;.
In practice, this is still sufficient to keep log producers from being rate limited by per-stream rate limits.&lt;/p&gt;
&lt;h2 id=&#34;automatic-stream-sharding-metrics&#34;&gt;Automatic stream sharding metrics&lt;/h2&gt;
&lt;p&gt;Use these metrics to help tune Loki so that it is sharding streams aggressively enough to avoid the per-stream rate
limit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;loki_rate_store_refresh_failures_total&lt;/code&gt;: The total number of failed attempts to refresh the distributor&amp;rsquo;s view of
stream rates.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_rate_store_streams&lt;/code&gt;: The number of unique streams reported by all Ingesters. Sharded streams are reported as if
they were unsharded.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_rate_store_max_stream_shards&lt;/code&gt;: The maximum number of shards for any tenant of the system.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_rate_store_stream_shards&lt;/code&gt;: A histogram of the distribution of shard counts across all streams.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_rate_store_max_stream_rate_bytes&lt;/code&gt;: The maximum stream size in bytes/second for any tenant of the system. Sharded
streams are reported as if they are unsharded.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_rate_store_max_unique_stream_rate_bytes&lt;/code&gt;: The maximum size of any stream across all tenants. Stream shards are
individually reported.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_rate_store_stream_rate_bytes&lt;/code&gt;: A histogram of the distribution of stream sizes across all tenants in
bytes/second.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_stream_sharding_count&lt;/code&gt;: The total number of times that streams have been sharded. Useful for calculating the
sharding rate.&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="manage-large-volume-log-streams-with-automatic-stream-sharding">Manage large volume log streams with automatic stream sharding&lt;/h1>
&lt;p>Automatic stream sharding can keep streams under a &lt;code>desired_rate&lt;/code> by adding new labels and values to
existing streams. When properly tuned, this can eliminate issues where log producers are rate limited due to the
per-stream rate limit.&lt;/p></description></item><item><title>Manage larger production deployments</title><link>https://grafana.com/docs/loki/v3.7.x/operations/scalability/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/scalability/</guid><content><![CDATA[&lt;h1 id=&#34;manage-larger-production-deployments&#34;&gt;Manage larger production deployments&lt;/h1&gt;
&lt;p&gt;When needing to scale Loki due to increased log volume, operators should consider running several Loki processes
partitioned by role (ingester, distributor, querier, and so on) rather than a single Loki
process. Grafana Labs&amp;rsquo; &lt;a href=&#34;https://github.com/grafana/loki/blob/main/production/ksonnet/loki&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;production setup&lt;/a&gt;
contains &lt;code&gt;.libsonnet&lt;/code&gt; files that demonstrates configuring separate components
and scaling for resource usage.&lt;/p&gt;
&lt;h2 id=&#34;separate-query-scheduler&#34;&gt;Separate Query Scheduler&lt;/h2&gt;
&lt;p&gt;The Query frontend has an in-memory queue that can be moved out into a separate process similar to the
&lt;a href=&#34;/docs/mimir/latest/operators-guide/architecture/components/query-scheduler/&#34;&gt;Grafana Mimir query-scheduler&lt;/a&gt;. This allows running multiple query frontends.&lt;/p&gt;
&lt;p&gt;To run with the Query Scheduler, the frontend needs to be passed the scheduler&amp;rsquo;s address via &lt;code&gt;-frontend.scheduler-address&lt;/code&gt; and the querier processes needs to be started with &lt;code&gt;-querier.scheduler-address&lt;/code&gt; set to the same address. Both options can also be defined via the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/&#34;&gt;configuration file&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It is not valid to start the querier with both a configured frontend and a scheduler address.&lt;/p&gt;
&lt;p&gt;The query scheduler process itself can be started via the &lt;code&gt;-target=query-scheduler&lt;/code&gt; option of the Loki Docker image. For instance, &lt;code&gt;docker run grafana/loki:latest -config.file=/etc/loki/config.yaml -target=query-scheduler -server.http-listen-port=8009 -server.grpc-listen-port=9009&lt;/code&gt; starts the query scheduler listening on ports &lt;code&gt;8009&lt;/code&gt; and &lt;code&gt;9009&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;memory-ballast&#34;&gt;Memory ballast&lt;/h2&gt;
&lt;p&gt;In compute-constrained environments, garbage collection can become a significant performance factor. Frequently-run garbage collection interferes with running the application by using CPU resources. The use of memory ballast can mitigate the issue. Memory ballast allocates extra, but unused virtual memory in order to inflate the quantity of live heap space. Garbage collection is triggered by the growth of heap space usage. The inflated quantity of heap space reduces the perceived growth, so garbage collection occurs less frequently.&lt;/p&gt;
&lt;p&gt;Configure memory ballast using the ballast_bytes configuration option.&lt;/p&gt;
&lt;h2 id=&#34;remote-rule-evaluation&#34;&gt;Remote rule evaluation&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;This feature was first proposed in &lt;a href=&#34;https://github.com/grafana/loki/pull/8129&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;&lt;code&gt;LID-0002&lt;/code&gt;&lt;/a&gt;; it contains the design decisions
which informed the implementation.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;By default, the &lt;code&gt;ruler&lt;/code&gt; component embeds a query engine to evaluate rules. This generally works fine, except when rules
are complex or have to process a large amount of data regularly. Poor performance of the &lt;code&gt;ruler&lt;/code&gt; manifests as recording rules metrics
with gaps or missed alerts. This situation can be detected by alerting on the &lt;code&gt;loki_prometheus_rule_group_iterations_missed_total&lt;/code&gt; metric
when it has a non-zero value.&lt;/p&gt;
&lt;p&gt;A solution to this problem is to externalize rule evaluation from the &lt;code&gt;ruler&lt;/code&gt; process. The &lt;code&gt;ruler&lt;/code&gt; embedded query engine
is single-threaded, meaning that rules are not split, sharded, or otherwise accelerated like regular Loki queries. The &lt;code&gt;query-frontend&lt;/code&gt;
component exists explicitly for this purpose and, when combined with a number of &lt;code&gt;querier&lt;/code&gt; instances, can massively
improve rule evaluation performance and lead to fewer missed iterations.&lt;/p&gt;
&lt;p&gt;It is generally recommended to create a separate &lt;code&gt;query-frontend&lt;/code&gt; deployment and &lt;code&gt;querier&lt;/code&gt; pool from your existing one - which handles adhoc
queries via Grafana, &lt;code&gt;logcli&lt;/code&gt;, or the API. Rules should be given priority over adhoc queries because they are used to produce
metrics or alerts which may be crucial to the reliable operation of your service; if you use the same &lt;code&gt;query-frontend&lt;/code&gt; and &lt;code&gt;querier&lt;/code&gt; pool
for both, your rules will be executed with the same priority as adhoc queries which could lead to unpredictable performance.&lt;/p&gt;
&lt;p&gt;To enable remote rule evaluation, set the following configuration options:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;ruler:
  evaluation:
    mode: remote
    query_frontend:
      address: dns:///&amp;lt;query-frontend-service&amp;gt;:&amp;lt;grpc-port&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;See 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#ruler&#34;&gt;&lt;code&gt;here&lt;/code&gt;&lt;/a&gt; for further configuration options.&lt;/p&gt;
&lt;p&gt;When you enable remote rule evaluation, the &lt;code&gt;ruler&lt;/code&gt; component becomes a gRPC client to the &lt;code&gt;query-frontend&lt;/code&gt; service;
this will result in far lower &lt;code&gt;ruler&lt;/code&gt; resource usage because the majority of the work has been externalized.
The LogQL queries coming from the &lt;code&gt;ruler&lt;/code&gt; will be executed against the given &lt;code&gt;query-frontend&lt;/code&gt; service.
Requests will be load-balanced across all &lt;code&gt;query-frontend&lt;/code&gt; IPs if the &lt;code&gt;dns:///&lt;/code&gt; prefix is used.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;Queries that fail to execute are &lt;em&gt;not&lt;/em&gt; retried.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h3 id=&#34;limits-and-observability&#34;&gt;Limits and Observability&lt;/h3&gt;
&lt;p&gt;Remote rule evaluation can be tuned with the following options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ruler_remote_evaluation_timeout&lt;/code&gt;: maximum allowable execution time for rule evaluations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ruler_remote_evaluation_max_response_size&lt;/code&gt;: maximum allowable response size over gRPC connection from &lt;code&gt;query-frontend&lt;/code&gt; to &lt;code&gt;ruler&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both of these can be specified globally in the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#limits_config&#34;&gt;&lt;code&gt;limits_config&lt;/code&gt;&lt;/a&gt; section
or on a 
    &lt;a href=&#34;/docs/loki/v3.7.x/configuration/#runtime-configuration-file&#34;&gt;per-tenant basis&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Remote rule evaluation exposes a number of metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_remote_eval_request_duration_seconds&lt;/code&gt;: time taken for rule evaluation (histogram)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_remote_eval_response_bytes&lt;/code&gt;: number of bytes in rule evaluation response (histogram)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_remote_eval_response_samples&lt;/code&gt;: number of samples in rule evaluation response (histogram)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_remote_eval_success_total&lt;/code&gt;: successful rule evaluations (counter)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_remote_eval_failure_total&lt;/code&gt;: unsuccessful rule evaluations with reasons (counter)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Each of these metrics are per-tenant, so cardinality must be taken into consideration.&lt;/p&gt;
]]></content><description>&lt;h1 id="manage-larger-production-deployments">Manage larger production deployments&lt;/h1>
&lt;p>When needing to scale Loki due to increased log volume, operators should consider running several Loki processes
partitioned by role (ingester, distributor, querier, and so on) rather than a single Loki
process. Grafana Labs&amp;rsquo; &lt;a href="https://github.com/grafana/loki/blob/main/production/ksonnet/loki" target="_blank" rel="noopener noreferrer">production setup&lt;/a>
contains &lt;code>.libsonnet&lt;/code> files that demonstrates configuring separate components
and scaling for resource usage.&lt;/p></description></item><item><title>Manage recording rules</title><link>https://grafana.com/docs/loki/v3.7.x/operations/recording-rules/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/recording-rules/</guid><content><![CDATA[&lt;h1 id=&#34;manage-recording-rules&#34;&gt;Manage recording rules&lt;/h1&gt;
&lt;p&gt;Recording rules are queries that run in an interval and produce metrics from logs that can be pushed to a Prometheus compatible backend.&lt;/p&gt;
&lt;p&gt;Recording rules are evaluated by the &lt;code&gt;ruler&lt;/code&gt; component. Each &lt;code&gt;ruler&lt;/code&gt; acts as its own &lt;code&gt;querier&lt;/code&gt;, in the sense that it
executes queries against the store without using the &lt;code&gt;query-frontend&lt;/code&gt; or &lt;code&gt;querier&lt;/code&gt; components. It will respect all query

    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#limits_config&#34;&gt;limits&lt;/a&gt; put in place for the &lt;code&gt;querier&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The Loki implementation of recording rules largely reuses Prometheus&amp;rsquo; code.&lt;/p&gt;
&lt;p&gt;Samples generated by recording rules are sent to Prometheus using Prometheus&amp;rsquo; &lt;strong&gt;remote-write&lt;/strong&gt; feature.&lt;/p&gt;
&lt;h2 id=&#34;write-ahead-log-wal&#34;&gt;Write-Ahead Log (WAL)&lt;/h2&gt;
&lt;p&gt;All samples generated by recording rules are written to a WAL. The WALs main benefit is that it persists the samples
generated by recording rules to disk, which means that if your &lt;code&gt;ruler&lt;/code&gt; crashes, you won&amp;rsquo;t lose any data.
We are trading off extra memory usage and slower start-up times for this functionality.&lt;/p&gt;
&lt;p&gt;A WAL is created per tenant; this is done to prevent cross-tenant interactions. If all samples were to be written
to a single WAL, this would increase the chances that one tenant could cause data-loss for others. A typical scenario here
is that Prometheus will, for example, reject a remote-write request with 100 samples if just 1 of those samples is invalid in some way.&lt;/p&gt;
&lt;h3 id=&#34;start-up&#34;&gt;Start-up&lt;/h3&gt;
&lt;p&gt;When the &lt;code&gt;ruler&lt;/code&gt; starts up, it will load the WALs for the tenants who have recording rules. These WAL files are stored
on disk and are loaded into memory.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;WALs are loaded one at a time upon start-up. This is a current limitation of the Loki ruler.
For this reason, it is adviseable that the number of rule groups serviced by a ruler be kept to a reasonable size, since
&lt;em&gt;no rule evaluation occurs while WAL replay is in progress (this includes alerting rules)&lt;/em&gt;.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h3 id=&#34;truncation&#34;&gt;Truncation&lt;/h3&gt;
&lt;p&gt;WAL files are regularly truncated to reduce their size on disk.
&lt;a href=&#34;https://ganeshvernekar.com/blog/prometheus-tsdb-wal-and-checkpoint/#wal-truncation-and-checkpointing&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;This guide&lt;/a&gt;
from one of the Prometheus maintainers (Ganesh Vernekar) gives an excellent overview of the truncation, checkpointing,
and replaying of the WAL.&lt;/p&gt;
&lt;h3 id=&#34;cleaner&#34;&gt;Cleaner&lt;/h3&gt;
&lt;p&gt;&lt;span style=&#34;background-color:#f3f973;&#34;&gt;WAL Cleaner is an experimental feature.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The WAL Cleaner watches for abandoned WALs (tenants who no longer have recording rules associated) and deletes them.
Enable this feature only if you are running into storage concerns with WALs that are too large. WALs should not grow
excessively large due to truncation.&lt;/p&gt;
&lt;h2 id=&#34;scaling&#34;&gt;Scaling&lt;/h2&gt;
&lt;p&gt;See Mimir&amp;rsquo;s guide for &lt;a href=&#34;/docs/mimir/latest/configure/configure-hash-rings/&#34;&gt;configuring Grafana Mimir hash rings&lt;/a&gt; for scaling the ruler using a ring.&lt;/p&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;The &lt;code&gt;ruler&lt;/code&gt; shards by rule &lt;em&gt;group&lt;/em&gt;, not by individual rules. This is an artifact of the fact that Prometheus
recording rules need to run in order since one recording rule can reuse another - but this is not possible in Loki.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h2 id=&#34;deployment&#34;&gt;Deployment&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;ruler&lt;/code&gt; needs to persist its WAL files to disk, and it incurs a bit of a start-up cost by reading these WALs into memory.
As such, it is recommended that you try to minimize churn of individual &lt;code&gt;ruler&lt;/code&gt; instances since rule evaluation is blocked
while the WALs are being read from disk.&lt;/p&gt;
&lt;h3 id=&#34;kubernetes&#34;&gt;Kubernetes&lt;/h3&gt;
&lt;p&gt;It is recommended that you run the &lt;code&gt;rulers&lt;/code&gt; using &lt;code&gt;StatefulSets&lt;/code&gt;. The &lt;code&gt;ruler&lt;/code&gt; will write its WAL files to persistent storage,
so a &lt;code&gt;Persistent Volume&lt;/code&gt; should be utilised.&lt;/p&gt;
&lt;h2 id=&#34;remote-write&#34;&gt;Remote-Write&lt;/h2&gt;
&lt;h3 id=&#34;client-configuration&#34;&gt;Client configuration&lt;/h3&gt;
&lt;p&gt;Remote-write client configuration is fully compatible with &lt;a href=&#34;https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;prometheus configuration format&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;remote_write:	
  clients:	
    mimir:	
      url: http://mimir/api/v1/push
      write_relabel_configs:
      - action: replace
        target_label: job
        replacement: loki-recording-rules&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;per-tenant-limits&#34;&gt;Per-Tenant Limits&lt;/h3&gt;
&lt;p&gt;Remote-write can be configured at a global level in the base configuration, and certain parameters tuned specifically on
a per-tenant basis. Most of the configuration options 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#ruler&#34;&gt;defined here&lt;/a&gt;
have 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#limits_config&#34;&gt;override options&lt;/a&gt; (which can be also applied at runtime!).&lt;/p&gt;
&lt;h3 id=&#34;tuning&#34;&gt;Tuning&lt;/h3&gt;
&lt;p&gt;Remote-write can be tuned if the default configuration is insufficient (see &lt;a href=&#34;#failure-modes&#34;&gt;Failure Modes&lt;/a&gt; below).&lt;/p&gt;
&lt;p&gt;There is a &lt;a href=&#34;https://prometheus.io/docs/practices/remote_write/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;guide&lt;/a&gt; on the Prometheus website, all of which applies to Loki, too.&lt;/p&gt;
&lt;p&gt;Rules can be evenly distributed across available rulers by using &lt;code&gt;-ruler.enable-sharding=true&lt;/code&gt; and &lt;code&gt;-ruler.sharding-strategy=&amp;quot;by-rule&amp;quot;&lt;/code&gt;.
Rule groups execute in order; this is a feature inherited from Prometheus&amp;rsquo; rule engine (which Loki uses), but Loki has no
need for this constraint because rules cannot depend on each other. The default sharding strategy will shard by rule groups,
but this may be undesirable as some rule groups could contain more expensive rules, which can lead to subsequent rules missing evaluations.
The &lt;code&gt;by-rule&lt;/code&gt; sharding strategy creates one rule group for each rule the ruler instance &amp;ldquo;owns&amp;rdquo; (based on its hash ring), and these rings
are all executed concurrently.&lt;/p&gt;
&lt;h2 id=&#34;observability&#34;&gt;Observability&lt;/h2&gt;
&lt;p&gt;Since Loki reuses the Prometheus code for recording rules and WALs, it also gains all of Prometheus&amp;rsquo; observability.&lt;/p&gt;
&lt;p&gt;Prometheus exposes a number of metrics for its WAL implementation, and these have all been prefixed with &lt;code&gt;loki_ruler_wal_&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For example: &lt;code&gt;prometheus_remote_storage_bytes_total&lt;/code&gt; → &lt;code&gt;loki_ruler_wal_prometheus_remote_storage_bytes_total&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Additional metrics are exposed, also with the prefix &lt;code&gt;loki_ruler_wal_&lt;/code&gt;. All per-tenant metrics contain a &lt;code&gt;tenant&lt;/code&gt;
label, so be aware that cardinality could begin to be a concern if the number of tenants grows sufficiently large.&lt;/p&gt;
&lt;p&gt;Some key metrics to note are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_appender_ready&lt;/code&gt;: whether a WAL appender is ready to accept samples (1) or not (0)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_total&lt;/code&gt;: number of samples sent per tenant to remote storage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples...&lt;/code&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_pending_total&lt;/code&gt;: samples buffered in memory, waiting to be sent to remote storage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_failed_total&lt;/code&gt;: samples that failed when sent to remote storage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_dropped_total&lt;/code&gt;: samples dropped by relabel configurations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_retried_total&lt;/code&gt;: samples re-resent to remote storage&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds&lt;/code&gt;: highest timestamp of sample appended to WAL&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds&lt;/code&gt;: highest timestamp of sample sent to remote storage.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We&amp;rsquo;ve created a basic &lt;a href=&#34;https://github.com/grafana/loki/tree/main/production/loki-mixin/dashboards/recording-rules.libsonnet&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;dashboard in our loki-mixin&lt;/a&gt;
which you can use to administer recording rules.&lt;/p&gt;
&lt;h2 id=&#34;failure-modes&#34;&gt;Failure Modes&lt;/h2&gt;
&lt;h3 id=&#34;remote-write-lagging&#34;&gt;Remote-Write Lagging&lt;/h3&gt;
&lt;p&gt;Remote-write can lag behind for many reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Remote-write storage (Prometheus) is temporarily unavailable&lt;/li&gt;
&lt;li&gt;A tenant is producing samples too quickly from a recording rule&lt;/li&gt;
&lt;li&gt;Remote-write is tuned too low, creating backpressure&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It can be determined by subtracting
&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds&lt;/code&gt; from
&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In case 1, the &lt;code&gt;ruler&lt;/code&gt; will continue to retry sending these samples until the remote storage becomes available again. Be
aware that if the remote storage is down for longer than &lt;code&gt;ruler.wal.max-age&lt;/code&gt;, data loss may occur after truncation occurs.&lt;/p&gt;
&lt;p&gt;In cases 2 and 3, you should consider &lt;a href=&#34;#tuning&#34;&gt;tuning&lt;/a&gt; remote-write appropriately.&lt;/p&gt;
&lt;p&gt;Further reading: see &lt;a href=&#34;/blog/2021/04/12/how-to-troubleshoot-remote-write-issues-in-prometheus/&#34;&gt;this blog post&lt;/a&gt;
by Prometheus maintainer Callum Styan.&lt;/p&gt;
&lt;h3 id=&#34;appender-not-ready&#34;&gt;Appender Not Ready&lt;/h3&gt;
&lt;p&gt;Each tenant&amp;rsquo;s WAL has an &amp;ldquo;appender&amp;rdquo; internally; this appender is used to &lt;em&gt;append&lt;/em&gt; samples to the WAL. The appender is marked
as &lt;em&gt;not ready&lt;/em&gt; until the WAL replay is complete upon startup. If the WAL is corrupted for some reason, or is taking a long
time to replay, you can determine this by alerting on &lt;code&gt;loki_ruler_wal_appender_ready &amp;lt; 1&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;corrupt-wal&#34;&gt;Corrupt WAL&lt;/h3&gt;
&lt;p&gt;If a disk fails or the &lt;code&gt;ruler&lt;/code&gt; does not terminate correctly, there&amp;rsquo;s a chance one or more tenant WALs can become corrupted.
A mechanism exists for automatically repairing the WAL, but this cannot handle every conceivable scenario. In this case,
the &lt;code&gt;loki_ruler_wal_corruptions_repair_failed_total&lt;/code&gt; metric will be incremented.&lt;/p&gt;
&lt;h3 id=&#34;found-another-failure-mode&#34;&gt;Found another failure mode?&lt;/h3&gt;
&lt;p&gt;Open an &lt;a href=&#34;https://github.com/grafana/loki/issues&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;issue&lt;/a&gt; and tell us about it!&lt;/p&gt;
]]></content><description>&lt;h1 id="manage-recording-rules">Manage recording rules&lt;/h1>
&lt;p>Recording rules are queries that run in an interval and produce metrics from logs that can be pushed to a Prometheus compatible backend.&lt;/p></description></item><item><title>Manage storage</title><link>https://grafana.com/docs/loki/v3.7.x/operations/storage/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/storage/</guid><content><![CDATA[&lt;h1 id=&#34;manage-storage&#34;&gt;Manage storage&lt;/h1&gt;
&lt;p&gt;You can read a high level overview of Loki storage 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/storage/&#34;&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Grafana Loki needs to store two different types of data: &lt;strong&gt;chunks&lt;/strong&gt; and &lt;strong&gt;indexes&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;When using Accelerated Search (experimental), then a third data type is used: &lt;strong&gt;bloom blocks&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Loki receives logs in separate streams, where each stream is uniquely identified
by its tenant ID and its set of labels. As log entries from a stream arrive,
they are compressed as &lt;strong&gt;chunks&lt;/strong&gt; and saved in the chunks store. See &lt;a href=&#34;#chunk-format&#34;&gt;chunk
format&lt;/a&gt; for how chunks are stored internally.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;index&lt;/strong&gt; stores each stream&amp;rsquo;s label set and links them to the individual
chunks. Refer to the Loki 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/&#34;&gt;configuration&lt;/a&gt; for
details on how to configure the storage and the index.&lt;/p&gt;
&lt;p&gt;For more information:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/storage/table-manager/&#34;&gt;Table Manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/storage/retention/&#34;&gt;Retention&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/storage/logs-deletion/&#34;&gt;Logs Deletion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;store-types&#34;&gt;Store Types&lt;/h2&gt;
&lt;h3 id=&#34;-supported-index-stores&#34;&gt;✅ Supported index stores&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/storage/tsdb/&#34;&gt;Single Store TSDB&lt;/a&gt; index store which stores TSDB index files in the object store. This is the recommended index store for Loki 2.8 and newer.&lt;/li&gt;
&lt;li&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/operations/storage/boltdb-shipper/&#34;&gt;Single Store BoltDB (boltdb-shipper)&lt;/a&gt; index store which stores boltdb index files in the object store. Recommended store for Loki 2.0 through 2.7.x.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;-deprecated-index-stores&#34;&gt;❌ Deprecated index stores&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://aws.amazon.com/dynamodb&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Amazon DynamoDB&lt;/a&gt;. Support for this is deprecated and will be removed in a future release.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cloud.google.com/bigtable&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Google Bigtable&lt;/a&gt;. Support for this is deprecated and will be removed in a future release.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cassandra.apache.org&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Apache Cassandra&lt;/a&gt;. Support for this is deprecated and will be removed in a future release.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/boltdb/bolt&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;BoltDB&lt;/a&gt; (doesn&amp;rsquo;t work when clustering Loki)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;-supported-and-recommended-chunks-stores&#34;&gt;✅ Supported and recommended chunks stores&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://aws.amazon.com/s3&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Amazon Simple Storage Service (S3)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cloud.google.com/storage/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Google Cloud Storage (GCS)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://azure.microsoft.com/en-us/products/storage/blobs&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Microsoft Azure Blob Storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.ibm.com/cloud/object-storage&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;IBM Cloud Object Storage (COS)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://intl.cloud.baidu.com/product/bos.html&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Baidu Object Storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://www.alibabacloud.com/product/object-storage-service&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Alibaba Object Storage Service (OSS)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;-supported-chunks-stores-not-typically-recommended-for-production-use&#34;&gt;⚠️ Supported chunks stores, not typically recommended for production use&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;filesystem/&#34;&gt;Filesystem&lt;/a&gt; (please read more about the filesystem to understand the pros/cons before using with production data)&lt;/li&gt;
&lt;li&gt;S3 API compatible storage, such as &lt;a href=&#34;https://min.io/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;MinIO&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;-deprecated-chunks-stores&#34;&gt;❌ Deprecated chunks stores&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://aws.amazon.com/dynamodb&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Amazon DynamoDB&lt;/a&gt;. Support for this is deprecated and will be removed in a future release.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cloud.google.com/bigtable&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Google Bigtable&lt;/a&gt;. Support for this is deprecated and will be removed in a future release.&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cassandra.apache.org&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Apache Cassandra&lt;/a&gt;. Support for this is deprecated and will be removed in a future release.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;cloud-storage-permissions&#34;&gt;Cloud Storage Permissions&lt;/h2&gt;
&lt;h3 id=&#34;s3&#34;&gt;S3&lt;/h3&gt;
&lt;p&gt;When using S3 as object storage, the following permissions are needed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;s3:ListBucket&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;s3:PutObject&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;s3:GetObject&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;s3:DeleteObject&lt;/code&gt; (if running the Single Store (boltdb-shipper) compactor)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;arn:aws:s3:::&amp;lt;bucket_name&amp;gt;&lt;/code&gt;, &lt;code&gt;arn:aws:s3:::&amp;lt;bucket_name&amp;gt;/*&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;See the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/storage/#aws-deployment-s3-single-store&#34;&gt;AWS deployment section&lt;/a&gt; on the storage page for a detailed setup guide.&lt;/p&gt;
&lt;h3 id=&#34;dynamodb&#34;&gt;DynamoDB&lt;/h3&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;DynamoDB support is deprecated and will be removed in a future release.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;p&gt;When using DynamoDB for the index, the following permissions are needed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dynamodb:BatchGetItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:BatchWriteItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:DeleteItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:DescribeTable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:GetItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:ListTagsOfResource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:PutItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:Query&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:TagResource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:UntagResource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:UpdateItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:UpdateTable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:CreateTable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:DeleteTable&lt;/code&gt; (if &lt;code&gt;table_manager.retention_period&lt;/code&gt; is more than 0s)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;arn:aws:dynamodb:&amp;lt;aws_region&amp;gt;:&amp;lt;aws_account_id&amp;gt;:table/&amp;lt;prefix&amp;gt;*&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dynamodb:ListTables&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;*&lt;/code&gt;&lt;/p&gt;
&lt;h4 id=&#34;autoscaling&#34;&gt;AutoScaling&lt;/h4&gt;
&lt;p&gt;If you enable autoscaling from table manager, the following permissions are needed:&lt;/p&gt;
&lt;h5 id=&#34;application-autoscaling&#34;&gt;Application Autoscaling&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DescribeScalableTargets&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DescribeScalingPolicies&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:RegisterScalableTarget&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DeregisterScalableTarget&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:PutScalingPolicy&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DeleteScalingPolicy&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;*&lt;/code&gt;&lt;/p&gt;
&lt;h5 id=&#34;iam&#34;&gt;IAM&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;iam:GetRole&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iam:PassRole&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;arn:aws:iam::&amp;lt;aws_account_id&amp;gt;:role/&amp;lt;role_name&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&#34;ibm-cloud-object-storage&#34;&gt;IBM Cloud Object Storage&lt;/h3&gt;
&lt;p&gt;When using IBM Cloud Object Storage (COS) as object storage, IAM &lt;code&gt;Writer&lt;/code&gt; role is needed.&lt;/p&gt;
&lt;p&gt;See the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/storage/#ibm-deployment-cos-single-store&#34;&gt;IBM Cloud Object Storage section&lt;/a&gt; on the storage page for a detailed setup guide.&lt;/p&gt;
&lt;h2 id=&#34;chunk-format&#34;&gt;Chunk Format&lt;/h2&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;// Header
&amp;#43;-----------------------------------&amp;#43;
| Magic Number (uint32, 4 bytes)    |
&amp;#43;-----------------------------------&amp;#43;
| Version (1 byte)                  |
&amp;#43;-----------------------------------&amp;#43;
| Encoding (1 byte)                 |
&amp;#43;-----------------------------------&amp;#43;

// Blocks
&amp;#43;--------------------&amp;#43;----------------------------&amp;#43;
| block 1 (n bytes)  | checksum (uint32, 4 bytes) |
&amp;#43;--------------------&amp;#43;----------------------------&amp;#43;
| block 2 (n bytes)  | checksum (uint32, 4 bytes) |
&amp;#43;--------------------&amp;#43;----------------------------&amp;#43;
| ...                                             |
&amp;#43;--------------------&amp;#43;----------------------------&amp;#43;
| block N (n bytes)  | checksum (uint32, 4 bytes) |
&amp;#43;--------------------&amp;#43;----------------------------&amp;#43;

// Metas
&amp;#43;------------------------------------------------------------------------------------------------------------------------&amp;#43;
| #blocks (uvarint)                                                                                                      |
&amp;#43;--------------------&amp;#43;-----------------&amp;#43;-----------------&amp;#43;------------------&amp;#43;---------------&amp;#43;----------------------------&amp;#43;
| #entries (uvarint) | minTs (uvarint) | maxTs (uvarint) | offset (uvarint) | len (uvarint) | uncompressedSize (uvarint) |
&amp;#43;--------------------&amp;#43;-----------------&amp;#43;-----------------&amp;#43;------------------&amp;#43;---------------&amp;#43;----------------------------&amp;#43;
| #entries (uvarint) | minTs (uvarint) | maxTs (uvarint) | offset (uvarint) | len (uvarint) | uncompressedSize (uvarint) |
&amp;#43;--------------------&amp;#43;-----------------&amp;#43;-----------------&amp;#43;------------------&amp;#43;---------------&amp;#43;----------------------------&amp;#43;
| ...                                                                                                                    |
&amp;#43;--------------------&amp;#43;-----------------&amp;#43;-----------------&amp;#43;------------------&amp;#43;---------------&amp;#43;----------------------------&amp;#43;
| #entries (uvarint) | minTs (uvarint) | maxTs (uvarint) | offset (uvarint) | len (uvarint) | uncompressedSize (uvarint) |
&amp;#43;--------------------&amp;#43;-----------------&amp;#43;-----------------&amp;#43;------------------&amp;#43;---------------&amp;#43;----------------------------&amp;#43;
| checksum (uint32, 4 bytes)                                                                                             | 
&amp;#43;------------------------------------------------------------------------------------------------------------------------&amp;#43;

// Structured Metadata
&amp;#43;---------------------------------&amp;#43;
| #labels (uvarint)               |
&amp;#43;---------------&amp;#43;-----------------&amp;#43;
| len (uvarint) | value (n bytes) |
&amp;#43;---------------&amp;#43;-----------------&amp;#43;
| ...                             |
&amp;#43;---------------&amp;#43;-----------------&amp;#43;
| checksum (uint32, 4 bytes)      |
&amp;#43;---------------------------------&amp;#43;

// Footer
&amp;#43;-----------------------&amp;#43;--------------------------&amp;#43;
| len (uint64, 8 bytes) | offset (uint64, 8 bytes) |   // offset to Structured Metadata
&amp;#43;-----------------------&amp;#43;--------------------------&amp;#43;
| len (uint64, 8 bytes) | offset (uint64, 8 bytes) |   // offset to Metas
&amp;#43;-----------------------&amp;#43;--------------------------&amp;#43;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="manage-storage">Manage storage&lt;/h1>
&lt;p>You can read a high level overview of Loki storage
&lt;a href="/docs/loki/v3.7.x/configure/storage/">here&lt;/a>&lt;/p>
&lt;p>Grafana Loki needs to store two different types of data: &lt;strong>chunks&lt;/strong> and &lt;strong>indexes&lt;/strong>.&lt;/p></description></item><item><title>Manage tenant isolation</title><link>https://grafana.com/docs/loki/v3.7.x/operations/multi-tenancy/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/multi-tenancy/</guid><content><![CDATA[&lt;h1 id=&#34;manage-tenant-isolation&#34;&gt;Manage tenant isolation&lt;/h1&gt;
&lt;p&gt;Grafana Loki is a multi-tenant system; requests and data for tenant A are isolated from
tenant B. Requests to the Loki API should include an HTTP header
(&lt;code&gt;X-Scope-OrgID&lt;/code&gt;) that identifies the tenant for the request.&lt;/p&gt;
&lt;p&gt;Tenant IDs can be any alphanumeric string that fits within the Go HTTP header
limit (1MB). Operators are recommended to use a reasonable limit for uniquely
identifying tenants; 20 bytes is usually enough.&lt;/p&gt;
&lt;p&gt;Loki defaults to running in multi-tenant mode.
Multi-tenant mode is set in the configuration with &lt;code&gt;auth_enabled: true&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;When configured with &lt;code&gt;auth_enabled: false&lt;/code&gt;, Loki uses a single tenant.
The &lt;code&gt;X-Scope-OrgID&lt;/code&gt; header is not required in Loki API requests.
The single tenant ID will be the string &lt;code&gt;fake&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;multi-tenant-queries&#34;&gt;Multi-tenant Queries&lt;/h2&gt;
&lt;p&gt;In multi-tenant mode, queries may gather results from multiple tenants.
Set the querier configuration option &lt;code&gt;multi_tenant_queries_enabled: true&lt;/code&gt; to enable queries across tenants.
The query API request defines the tenants.
Specify multiple tenants
in the query request HTTP header &lt;code&gt;X-Scope-OrgID&lt;/code&gt; by separating the tenant IDs with the pipe character (&lt;code&gt;|&lt;/code&gt;).
For example, a query for tenants &lt;code&gt;A&lt;/code&gt; and &lt;code&gt;B&lt;/code&gt; requires the header &lt;code&gt;X-Scope-OrgID: A|B&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Only query endpoints support multi-tenant calls.
Calls to &lt;code&gt;GET /loki/api/v1/tail&lt;/code&gt; and &lt;code&gt;POST /loki/api/v1/push&lt;/code&gt; will return an HTTP 400 error if more than one tenant is defined in the HTTP header.&lt;/p&gt;
&lt;p&gt;Instant and range queries support label filtering using tenant IDs.
For example, the query&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;{app=&amp;#34;foo&amp;#34;, __tenant_id__=~&amp;#34;a.&amp;#43;&amp;#34;} | logfmt&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;will return results for all tenants
that have a tenant ID that begins with the character &lt;code&gt;a&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If the label &lt;code&gt;__tenant_id__&lt;/code&gt; is already present in a log stream, it is prepended with the string &lt;code&gt;original_&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Tenant ID filtering in stages is not supported.
An example of a query that will &lt;em&gt;not&lt;/em&gt; work:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;{app=&amp;#34;foo&amp;#34;} | __tenant_id__=&amp;#34;1&amp;#34; | logfmt&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;restrictions&#34;&gt;Restrictions&lt;/h2&gt;
&lt;p&gt;Tenant IDs must not be longer than 150 bytes and can only include the following characters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Alphanumeric characters
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;0-9&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;a-z&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;A-Z&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Special characters
&lt;ul&gt;
&lt;li&gt;Exclamation point (&lt;code&gt;!&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Hyphen (&lt;code&gt;-&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Underscore (&lt;code&gt;_&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Single period (&lt;code&gt;.&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Asterisk (&lt;code&gt;*&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Single quote (&lt;code&gt;&#39;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Open parenthesis (&lt;code&gt;(&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Close parenthesis (&lt;code&gt;)&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;For security reasons, &lt;code&gt;.&lt;/code&gt; and &lt;code&gt;..&lt;/code&gt; aren&amp;rsquo;t valid tenant IDs.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

]]></content><description>&lt;h1 id="manage-tenant-isolation">Manage tenant isolation&lt;/h1>
&lt;p>Grafana Loki is a multi-tenant system; requests and data for tenant A are isolated from
tenant B. Requests to the Loki API should include an HTTP header
(&lt;code>X-Scope-OrgID&lt;/code>) that identifies the tenant for the request.&lt;/p></description></item><item><title>Manage varying workloads at scale with autoscaling queriers</title><link>https://grafana.com/docs/loki/v3.7.x/operations/autoscaling_queriers/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/autoscaling_queriers/</guid><content><![CDATA[&lt;h1 id=&#34;manage-varying-workloads-at-scale-with-autoscaling-queriers&#34;&gt;Manage varying workloads at scale with autoscaling queriers&lt;/h1&gt;
&lt;p&gt;A microservices deployment of a Loki cluster that runs on Kubernetes typically handles a
workload that varies throughout the day.
To make Loki easier to operate and optimize the cost of running Loki at scale,
we have designed a set of resources to help you autoscale your Loki queriers.&lt;/p&gt;
&lt;h2 id=&#34;prerequisites&#34;&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;You need to run Loki in Kubernetes as a set of microservices. You need to use the query-scheduler.&lt;/p&gt;
&lt;p&gt;We recommend using &lt;a href=&#34;https://keda.sh/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Kubernetes Event-Driven Autoscaling (KEDA)&lt;/a&gt; to configure autoscaling
based on Prometheus metrics. Refer to &lt;a href=&#34;https://keda.sh/docs/latest/deploy&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Deploying KEDA&lt;/a&gt; to learn more
about setting up KEDA in your Kubernetes cluster.&lt;/p&gt;
&lt;h2 id=&#34;scaling-metric&#34;&gt;Scaling metric&lt;/h2&gt;
&lt;p&gt;Because queriers pull queries from the query-scheduler queue and process them on the querier workers, you should scale metrics based on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The scheduler queue size.&lt;/li&gt;
&lt;li&gt;The queries running in the queriers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The query-scheduler exposes the &lt;code&gt;loki_query_scheduler_inflight_requests&lt;/code&gt; metric.
It tracks the sum of queued queries plus the number of queries currently running in the querier workers.
The following query is useful to scale queriers based on the inflight requests.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;sum(
  max_over_time(
    loki_query_scheduler_inflight_requests{namespace=&amp;#34;loki-cluster&amp;#34;, quantile=&amp;#34;&amp;lt;Q&amp;gt;&amp;#34;}[&amp;lt;R&amp;gt;]
  )
)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Use the quantile (Q) and the range (R) parameters to fine-tune the metric.
The higher Q is, the more sensitive the metric is to short-lasting spikes.
As R increases, you can reduce the variation over time in the metric.
A higher R-value helps avoid the autoscaler from modifying the number of replicas too frequently.&lt;/p&gt;
&lt;p&gt;In our experience, we have found that a Q of 0.75 and an R of 2 minutes work well.
You can adjust these values according to your workload.&lt;/p&gt;
&lt;h2 id=&#34;cluster-capacity-planning&#34;&gt;Cluster capacity planning&lt;/h2&gt;
&lt;p&gt;To scale the Loki queries, you configure the following settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The threshold for scaling up and down&lt;/li&gt;
&lt;li&gt;The scale down stabilization period&lt;/li&gt;
&lt;li&gt;The minimum and the maximum number of queriers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Querier workers process queries from the queue. You can configure each Loki querier to run several workers.
To reserve workforce headroom to address workload spikes, our recommendation is not to use more than 75% of the workers.
For example, if you configure the Loki queriers to run 6 workers, set a threshold of &lt;code&gt;floor(0.75 * 6) = 4&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To determine the minimum number of queries that you should run, run at least one querier and determine the average
number of inflight requests the system processes 75% of the time over seven days. The target utilization of the queries is 75%.
So if we use 6 workers per querier, we will use the following query:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;clamp_min(ceil(
    avg(
        avg_over_time(loki_query_scheduler_inflight_requests{namespace=&amp;#34;loki-cluster&amp;#34;, quantile=&amp;#34;0.75&amp;#34;}[7d])
    ) / scalar(floor(vector(6 * 0.75)))
), 1)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The maximum number of queriers to run is equal to the number of queriers required to process all inflight
requests 50% of the time during a seven-day timespan.
As for the previous example, if each querier runs 6 workers, divide the inflight requests by 6.
The resulting query becomes:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;ceil(
    max(
        max_over_time(loki_query_scheduler_inflight_requests{namespace=&amp;#34;loki-cluster&amp;#34;, quantile=&amp;#34;0.5&amp;#34;}[7d])
    ) / 6
)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To minimize the scenario where Loki scales up shortly after scaling down, set
a stabilization window for scaling down.&lt;/p&gt;
&lt;h3 id=&#34;keda-configuration&#34;&gt;KEDA configuration&lt;/h3&gt;
&lt;p&gt;This &lt;a href=&#34;https://keda.sh/docs/latest/concepts/scaling-deployments/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;KEDA ScaledObject&lt;/a&gt; example configures autoscaling
for the querier deployment in the &lt;code&gt;loki-cluster&lt;/code&gt; namespace.
The example shows the minimum number of replicas set to 10 and the maximum number of replicas set to 50.
Because each querier runs 6 workers, aiming to use 75% of those workers, the threshold is set to 4.
The metric is served at &lt;code&gt;http://prometheus.default:9090/prometheus&lt;/code&gt;. We configure a stabilization window of 30 minutes.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: querier
  namespace: loki-cluster
spec:
  maxReplicaCount: 50
  minReplicaCount: 10
  scaleTargetRef:
    kind: Deployment
    name: querier
  triggers:
  - metadata:
      metricName: querier_autoscaling_metric
      query: sum(max_over_time(loki_query_scheduler_inflight_requests{namespace=&amp;#34;loki-cluster&amp;#34;, quantile=&amp;#34;0.75&amp;#34;}[2m]))
      serverAddress: http://prometheus.default:9090/prometheus
      threshold: &amp;#34;4&amp;#34;
    type: prometheus
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 1800&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;prometheus-alerting-when-at-capacity&#34;&gt;Prometheus alerting when at capacity&lt;/h2&gt;
&lt;p&gt;Because the configured maximum might not be sufficient, a Prometheus alert can identify
when the quantity of queriers has been at its configured maximum for an extended time. The following example specifies three hours (&lt;code&gt;3h&lt;/code&gt;) as the extended time:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;name: LokiAutoscalerMaxedOut
expr: kube_horizontalpodautoscaler_status_current_replicas{namespace=~&amp;#34;loki-cluster&amp;#34;} == kube_horizontalpodautoscaler_spec_max_replicas{namespace=~&amp;#34;loki-cluster&amp;#34;}
for: 3h
labels:
  severity: warning
annotations:
  description: HPA {{ $labels.namespace }}/{{ $labels.horizontalpodautoscaler }} has been running at max replicas for longer than 3h; this can indicate underprovisioning.
  summary: HPA has been running at max replicas for an extended time&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="manage-varying-workloads-at-scale-with-autoscaling-queriers">Manage varying workloads at scale with autoscaling queriers&lt;/h1>
&lt;p>A microservices deployment of a Loki cluster that runs on Kubernetes typically handles a
workload that varies throughout the day.
To make Loki easier to operate and optimize the cost of running Loki at scale,
we have designed a set of resources to help you autoscale your Loki queriers.&lt;/p></description></item><item><title>Manage version upgrades</title><link>https://grafana.com/docs/loki/v3.7.x/operations/upgrade/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/upgrade/</guid><content><![CDATA[&lt;h1 id=&#34;manage-version-upgrades&#34;&gt;Manage version upgrades&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/setup/upgrade/&#34;&gt;Upgrade&lt;/a&gt; from one Loki version to a newer version.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;
    &lt;a href=&#34;/docs/loki/v3.7.x/setup/upgrade/&#34;&gt;Upgrade Helm&lt;/a&gt; from Helm v2.x to Helm v3.x.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="manage-version-upgrades">Manage version upgrades&lt;/h1>
&lt;ul>
&lt;li>
&lt;p>
&lt;a href="/docs/loki/v3.7.x/setup/upgrade/">Upgrade&lt;/a> from one Loki version to a newer version.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>
&lt;a href="/docs/loki/v3.7.x/setup/upgrade/">Upgrade Helm&lt;/a> from Helm v2.x to Helm v3.x.&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Monitor tenant limits using the Overrides Exporter</title><link>https://grafana.com/docs/loki/v3.7.x/operations/overrides-exporter/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/overrides-exporter/</guid><content><![CDATA[&lt;h1 id=&#34;monitor-tenant-limits-using-the-overrides-exporter&#34;&gt;Monitor tenant limits using the Overrides Exporter&lt;/h1&gt;
&lt;p&gt;Loki is a multi-tenant system that supports applying limits to each tenant as a mechanism for resource management. The &lt;code&gt;overrides-exporter&lt;/code&gt; module exposes these limits as Prometheus metrics in order to help operators better understand tenant behavior.&lt;/p&gt;
&lt;h2 id=&#34;context&#34;&gt;Context&lt;/h2&gt;
&lt;p&gt;Configuration updates to tenant limits can be applied to Loki without restart via the 
    &lt;a href=&#34;/docs/loki/v3.7.x/configure/#runtime_config&#34;&gt;&lt;code&gt;runtime_config&lt;/code&gt;&lt;/a&gt; feature.&lt;/p&gt;
&lt;h2 id=&#34;example&#34;&gt;Example&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;overrides-exporter&lt;/code&gt; module is disabled by default. We recommend running a single instance per cluster to avoid issues with metric cardinality. The &lt;code&gt;overrides-exporter&lt;/code&gt; creates one metric for every scalar field in the limits configuration under the metric &lt;code&gt;loki_overrides_defaults&lt;/code&gt; with the default value for that field after loading the Loki configuration. It also exposes another metric for &lt;em&gt;every&lt;/em&gt; differing field for &lt;em&gt;every&lt;/em&gt; tenant.&lt;/p&gt;
&lt;p&gt;Using an example &lt;code&gt;runtime.yaml&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  &amp;#34;tenant_1&amp;#34;:
    ingestion_rate_mb: 10
    max_streams_per_user: 100000
    max_chunks_per_query: 100000&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Launch an instance of the &lt;code&gt;overrides-exporter&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;shell&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-shell&#34;&gt;loki -target=overrides-exporter -runtime-config.file=runtime.yaml -config.file=basic_schema_config.yaml -server.http-listen-port=8080&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To inspect the tenant limit overrides:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;shell&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-shell&#34;&gt;$ curl -sq localhost:8080/metrics | grep override
# HELP loki_overrides Resource limit overrides applied to tenants
# TYPE loki_overrides gauge
loki_overrides{limit_name=&amp;#34;ingestion_rate_mb&amp;#34;,user=&amp;#34;tenant_1&amp;#34;} 10
loki_overrides{limit_name=&amp;#34;max_chunks_per_query&amp;#34;,user=&amp;#34;tenant_1&amp;#34;} 100000
loki_overrides{limit_name=&amp;#34;max_streams_per_user&amp;#34;,user=&amp;#34;tenant_1&amp;#34;} 100000
# HELP loki_overrides_defaults Default values for resource limit overrides applied to tenants
# TYPE loki_overrides_defaults gauge
loki_overrides_defaults{limit_name=&amp;#34;cardinality_limit&amp;#34;} 100000
loki_overrides_defaults{limit_name=&amp;#34;creation_grace_period&amp;#34;} 6e&amp;#43;11
loki_overrides_defaults{limit_name=&amp;#34;ingestion_burst_size_mb&amp;#34;} 6
loki_overrides_defaults{limit_name=&amp;#34;ingestion_rate_mb&amp;#34;} 4
loki_overrides_defaults{limit_name=&amp;#34;max_cache_freshness_per_query&amp;#34;} 6e&amp;#43;10
loki_overrides_defaults{limit_name=&amp;#34;max_chunks_per_query&amp;#34;} 2e&amp;#43;06
loki_overrides_defaults{limit_name=&amp;#34;max_concurrent_tail_requests&amp;#34;} 10
loki_overrides_defaults{limit_name=&amp;#34;max_entries_limit_per_query&amp;#34;} 5000
loki_overrides_defaults{limit_name=&amp;#34;max_global_streams_per_user&amp;#34;} 5000
loki_overrides_defaults{limit_name=&amp;#34;max_label_name_length&amp;#34;} 1024
loki_overrides_defaults{limit_name=&amp;#34;max_label_names_per_series&amp;#34;} 30
loki_overrides_defaults{limit_name=&amp;#34;max_label_value_length&amp;#34;} 2048
loki_overrides_defaults{limit_name=&amp;#34;max_line_size&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;max_queriers_per_tenant&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;max_query_length&amp;#34;} 2.5956e&amp;#43;15
loki_overrides_defaults{limit_name=&amp;#34;max_query_lookback&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;max_query_parallelism&amp;#34;} 32
loki_overrides_defaults{limit_name=&amp;#34;max_query_series&amp;#34;} 500
loki_overrides_defaults{limit_name=&amp;#34;max_streams_matchers_per_query&amp;#34;} 1000
loki_overrides_defaults{limit_name=&amp;#34;max_streams_per_user&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;min_sharding_lookback&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;per_stream_rate_limit&amp;#34;} 3.145728e&amp;#43;06
loki_overrides_defaults{limit_name=&amp;#34;per_stream_rate_limit_burst&amp;#34;} 1.572864e&amp;#43;07
loki_overrides_defaults{limit_name=&amp;#34;per_tenant_override_period&amp;#34;} 1e&amp;#43;10
loki_overrides_defaults{limit_name=&amp;#34;reject_old_samples_max_age&amp;#34;} 1.2096e&amp;#43;15
loki_overrides_defaults{limit_name=&amp;#34;retention_period&amp;#34;} 2.6784e&amp;#43;15
loki_overrides_defaults{limit_name=&amp;#34;ruler_evaluation_delay_duration&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_max_rule_groups_per_tenant&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_max_rules_per_rule_group&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_batch_send_deadline&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_capacity&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_max_backoff&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_max_samples_per_send&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_max_shards&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_min_backoff&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_min_shards&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_timeout&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;split_queries_by_interval&amp;#34;} 0&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Alerts can be created based on these metrics to inform operators when tenants are close to hitting their limits allowing for increases to be applied before the tenant limits are exceeded.&lt;/p&gt;
]]></content><description>&lt;h1 id="monitor-tenant-limits-using-the-overrides-exporter">Monitor tenant limits using the Overrides Exporter&lt;/h1>
&lt;p>Loki is a multi-tenant system that supports applying limits to each tenant as a mechanism for resource management. The &lt;code>overrides-exporter&lt;/code> module exposes these limits as Prometheus metrics in order to help operators better understand tenant behavior.&lt;/p></description></item><item><title>Speed up ingester rollout using zone awareness</title><link>https://grafana.com/docs/loki/v3.7.x/operations/zone-ingesters/</link><pubDate>Thu, 09 Apr 2026 02:28:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v3.7.x/operations/zone-ingesters/</guid><content><![CDATA[&lt;h1 id=&#34;speed-up-ingester-rollout-using-zone-awareness&#34;&gt;Speed up ingester rollout using zone awareness&lt;/h1&gt;
&lt;p&gt;The Loki zone aware ingesters are used by Grafana Labs in order to allow for easier rollouts of large Loki deployments. You can think of them as three logical zones, however with some extra Kubernetes configuration you could deploy them in separate zones.&lt;/p&gt;
&lt;p&gt;By default, an incoming log stream&amp;rsquo;s logs are replicated to 3 random ingesters. Except in the case of some replica scaling up or down, a given stream will always be replicated to the same 3 ingesters. This means that if one of those ingesters is restarted no data is lost. However two or more ingesters restarting can result in data loss and also impacts the systems ability to ingest logs because of an unhealthy ring status.&lt;/p&gt;
&lt;p&gt;With zone awareness enabled, an incoming log line will be replicated to one ingester in each zone. This means that we&amp;rsquo;re not only concerned about ingesters in multiple zones restarting at the same time, we can now rollout or lose an entire zone at once without impacting writes. This allows deployments with a large number of ingesters to be deployed much more quickly.&lt;/p&gt;
&lt;p&gt;At Grafana Labs, we also make use of &lt;a href=&#34;https://github.com/grafana/rollout-operator&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;rollout-operator&lt;/a&gt; to manage rollouts to the 3 StatefulSets gracefully. The rollout-operator looks for labels on StatefulSets to know which ones are part of a certain rollout group, and coordinates rollouts of pods only from a single StatefulSet in the group at a time. See the README in the rollout-operator repo for a more in depth explanation.&lt;/p&gt;
&lt;h2 id=&#34;migration&#34;&gt;Migration&lt;/h2&gt;
&lt;p&gt;Migrating from a single ingester StatefulSet to 3 zone aware ingester StatefulSets. The migration follows a few general steps, regardless of deployment method.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure your existing ingesters to be part of a zone, for example &lt;code&gt;zone-default&lt;/code&gt;, this will allow us to later exclude them from the write path while still allowing for graceful shutdowns.&lt;/li&gt;
&lt;li&gt;Prep for the increase in active streams (due to the way streams are split between ingesters) by increasing the number of active streams allowed for your tenants.&lt;/li&gt;
&lt;li&gt;Add and scale up your new zone-aware ingester StatefulSets such that each has 1/3rd of the total number of replicas you want to run.&lt;/li&gt;
&lt;li&gt;Enable zone awareness on the write path by setting &lt;code&gt;distributor.zone-awareness-enabled&lt;/code&gt; to &lt;code&gt;true&lt;/code&gt; for distributors and rulers.&lt;/li&gt;
&lt;li&gt;Wait some time to ensure that the new zone-aware ingesters have data for the time period they are queried for (&lt;code&gt;query_ingesters_within&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Enable zone awareness on the read path by setting &lt;code&gt;distributor.zone-awareness-enabled&lt;/code&gt; to true for queriers.&lt;/li&gt;
&lt;li&gt;Configure distributors and rulers to exclude ingesters in the &lt;code&gt;zone-default&lt;/code&gt; so those ingesters no longer receive write traffic using &lt;code&gt;distributor.excluded-zones&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Use the shutdown endpoint to flush data from the default ingesters, then scale down and remove the associated StatefulSet.&lt;/li&gt;
&lt;li&gt;Clean up any config remaining from the migration.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&#34;detailed-migration-steps&#34;&gt;Detailed Migration Steps&lt;/h3&gt;
&lt;p&gt;The following are steps to live migrate (no downtime) an existing Loki deployment from a single ingester StatefulSet to 3 zone aware ingester StatefulSets.&lt;/p&gt;
&lt;p&gt;These instructions assume you are using the zone aware ingester jsonnet deployment code from this repo, see &lt;a href=&#34;https://github.com/grafana/loki/blob/main/production/ksonnet/loki/multi-zone.libsonnet&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;here&lt;/a&gt;. &lt;strong&gt;If you are not using jsonnet see the relevant annotations in some steps that describe how to perform that step manually.&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Configure the zone for the existing ingester StatefulSet as &lt;code&gt;zone-default&lt;/code&gt; by setting &lt;code&gt;multi_zone_default_ingester_zone: true&lt;/code&gt;, this allows us to later filter out that zone from the write path.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Configure ingester-pdb with &lt;code&gt;maxUnavailable&lt;/code&gt; as 0 and deploy 3x zone-aware StatefulSets with 0 replicas by setting&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;jsonnet&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-jsonnet&#34;&gt;_config&amp;#43;:: {
    multi_zone_ingester_enabled: true,
    multi_zone_ingester_migration_enabled: true,
    multi_zone_ingester_replicas: 0,
    // These last two lines are necessary now that we enable zone aware ingester by default
    // so that newly created cells will not be migrated later on. If you miss them you will
    // break writes in the cell.
    multi_zone_ingester_replication_write_path_enabled: false,
    multi_zone_ingester_replication_read_path_enabled: false,
},&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you&amp;rsquo;re not using jsonnet, the new ingester StatefulSets should have a label with &lt;code&gt;rollout-group: ingester&lt;/code&gt;, annotation &lt;code&gt;rollout-max-unavailable: x&lt;/code&gt; (put a placeholder value in, later you should set the value of this to be some portion of the StatefulSets total replicas, for example in jsonnet we template this so that each StatefulSet runs 1/3 of the total replicas and the max unavailable is 1/3 of each StatefulSets replicas), and set the update strategy to &lt;code&gt;OnDelete&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Diff ingester and ingester-zone-a StatefulSets and make sure all config matches&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;kubectl get statefulset -n loki-dev-008 ingester -o yaml &amp;gt; ingester.yaml
kubectl get statefulset -n loki-dev-008 ingester-zone-a -o yaml &amp;gt; ingester-zone-a.yaml
diff ingester.yaml ingester-zone-a.yaml&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Expected in diffs are values like: creation time and revision number, the zone, fields used by rollout operator, number of replicas, anything related to kustomize/flux, and PVC for the WAL since the containers don&amp;rsquo;t exist yet.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Temporarily double max series limits for users that are using more than 50% of their current limit, the queries are as follows (add label selectors as appropriate):&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum by (tenant)(sum (loki_ingester_memory_streams) by (cluster, namespace, tenant) / on (namespace) group_left max by(namespace) (loki_distributor_replication_factor))
&amp;gt;
on (tenant) (
max by (tenant) (label_replace(loki_overrides{limit_name=&amp;#34;max_global_streams_per_user&amp;#34;} / 2.5, &amp;#34;tenant&amp;#34;, &amp;#34;$1&amp;#34;, &amp;#34;user&amp;#34;, &amp;#34;(.&amp;#43;)&amp;#34;))
)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;(sum (loki_ingester_memory_streams) by (cluster, namespace, tenant) / on (namespace) group_left max by(namespace) (loki_distributor_replication_factor)
) / ignoring(tenant) group_left max by (cluster, namespace)(loki_overrides_defaults{limit_name=&amp;#34;max_global_streams_per_user&amp;#34;}) &amp;gt; 0.4)
unless on (tenant) (
(label_replace(loki_overrides{limit_name=&amp;#34;max_global_streams_per_user&amp;#34;},&amp;#34;tenant&amp;#34;, &amp;#34;$1&amp;#34;, &amp;#34;user&amp;#34;, &amp;#34;(.&amp;#43;)&amp;#34;)))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Scale up zone-aware StatefulSets until they have 1/3rd of replicas each. In smaller cells you can do this all at once, in larger cells it is safer to do it in chunks. The config value you need to change is &lt;code&gt;multi_zone_ingester_replicas: 6&lt;/code&gt;, the value will be split across the three StatefulSets. In this case, each StatefulSet would run 2 replicas.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re not using jsonnet, this is the step where you would also set the annotation &lt;code&gt;rollout-max-unavailable&lt;/code&gt; to some value that is less than or equal to the number of replicas each StatefulSet is running.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enable zone awareness on the write path by setting &lt;code&gt;multi_zone_ingester_replication_write_path_enabled: true&lt;/code&gt;, this causes distributors and rulers to reshuffle series to distributors in each zone.  Be sure to check that all the distributors and rulers have restarted properly.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re not using jsonnet, enable zone awareness on the write path by setting &lt;code&gt;distributor.zone-awareness-enabled&lt;/code&gt; to true for distributors and rulers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Wait for &lt;code&gt;query_ingesters_within&lt;/code&gt; configured hours. The default is &lt;code&gt;3h&lt;/code&gt;. This ensures that no data will be missing if we query a new ingester. However, because we cut chunks at least every 30m due to &lt;code&gt;chunk_idle_period&lt;/code&gt; we can likely reduce this amount of time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check that rule evaluations are still correct on the migration, look for increases in the rate for metrics with names with the following suffixes:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;rule_evaluations_total
rule_evaluation_failures_total
rule_group_iterations_missed_total&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Enable zone-aware replication on the read path &lt;code&gt;multi_zone_ingester_replication_read_path_enabled: true&lt;/code&gt;. If you&amp;rsquo;re not using jsonnet, set &lt;code&gt;distributor.zone-awareness-enabled&lt;/code&gt; to true for queriers.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Check that queries are still executing correctly, for example look at &lt;code&gt;loki_logql_querystats_latency_seconds_count&lt;/code&gt; to see that you don&amp;rsquo;t have a big increase in latency or error count for a specific query type.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Configure distributor / ruler to exclude ingesters in the &lt;code&gt;zone-default&lt;/code&gt; so those ingesters no longer receive write traffic by setting &lt;code&gt;multi_zone_ingester_exclude_default: true&lt;/code&gt;. If you&amp;rsquo;re not using jsonnet set &lt;code&gt;distributor.excluded-zones&lt;/code&gt; on distributors and rulers.&lt;/p&gt;
&lt;p&gt;It is a good idea to check rules evaluations again at this point, and also that the zone aware ingester StatefulSet is now receiving all the write traffic, you can compare &lt;code&gt;sum(loki_ingester_memory_streams{cluster=&amp;quot;&amp;lt;cluster&amp;gt;&amp;quot;,job=~&amp;quot;(&amp;lt;namespace&amp;gt;)/ingester&amp;quot;})&lt;/code&gt; to &lt;code&gt;sum(loki_ingester_memory_streams{cluster=&amp;quot;&amp;lt;cluster&amp;gt;&amp;quot;,job=~&amp;quot;(&amp;lt;namespace&amp;gt;)/ingester-zone.*&amp;quot;})&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you&amp;rsquo;re using an automated reconciliation or deployment system like flux, disable it now (for example using flux ignore) if possible for just the default ingester StatefulSet.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Shutdown flush the default ingesters, unregistering them from the ring, you can do this by port-forwarding each ingester Pod and using the endpoint: &lt;code&gt;&amp;quot;http://url:PORT/ingester/shutdown?flush=true&amp;amp;delete_ring_tokens=true&amp;amp;terminate=false&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;manually scale down the default ingester StatefulSet to 0 replicas, we do this via &lt;code&gt;tk apply&lt;/code&gt; but you could do it via modifying the yaml.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;merge a PR to your central config repo to keep the StatefulSet 0&amp;rsquo;d, and then remove the flux ignore.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;clean up any remaining temporary config from the migration, for example &lt;code&gt;multi_zone_ingester_migration_enabled: true&lt;/code&gt; is no longer needed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ensure that all the old default ingester PVC/PV are removed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
]]></content><description>&lt;h1 id="speed-up-ingester-rollout-using-zone-awareness">Speed up ingester rollout using zone awareness&lt;/h1>
&lt;p>The Loki zone aware ingesters are used by Grafana Labs in order to allow for easier rollouts of large Loki deployments. You can think of them as three logical zones, however with some extra Kubernetes configuration you could deploy them in separate zones.&lt;/p></description></item></channel></rss>