<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Operations on Grafana Labs</title><link>https://grafana.com/docs/loki/v2.4.x/operations/</link><description>Recent content in Operations on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/loki/v2.4.x/operations/index.xml" rel="self" type="application/rss+xml"/><item><title>Authentication</title><link>https://grafana.com/docs/loki/v2.4.x/operations/authentication/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/authentication/</guid><content><![CDATA[&lt;h1 id=&#34;authentication-with-grafana-loki&#34;&gt;Authentication with Grafana Loki&lt;/h1&gt;
&lt;p&gt;Grafana Loki does not come with any included authentication layer. Operators are
expected to run an authenticating reverse proxy in front of your services, such
as NGINX using basic auth or an OAuth2 proxy.&lt;/p&gt;
&lt;p&gt;Note that when using Loki in multi-tenant mode, Loki requires the HTTP header
&lt;code&gt;X-Scope-OrgID&lt;/code&gt; to be set to a string identifying the tenant; the responsibility
of populating this value should be handled by the authenticating reverse proxy.
Read the &lt;a href=&#34;../multi-tenancy/&#34;&gt;multi-tenancy&lt;/a&gt; documentation for more information.&lt;/p&gt;
&lt;p&gt;For information on authenticating Promtail, please see the docs for &lt;a href=&#34;../../clients/promtail/configuration/&#34;&gt;how to
configure Promtail&lt;/a&gt;.&lt;/p&gt;
]]></content><description>&lt;h1 id="authentication-with-grafana-loki">Authentication with Grafana Loki&lt;/h1>
&lt;p>Grafana Loki does not come with any included authentication layer. Operators are
expected to run an authenticating reverse proxy in front of your services, such
as NGINX using basic auth or an OAuth2 proxy.&lt;/p></description></item><item><title>Observability</title><link>https://grafana.com/docs/loki/v2.4.x/operations/observability/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/observability/</guid><content><![CDATA[&lt;h1 id=&#34;observing-grafana-loki&#34;&gt;Observing Grafana Loki&lt;/h1&gt;
&lt;p&gt;Both Grafana Loki and Promtail expose a &lt;code&gt;/metrics&lt;/code&gt; endpoint that expose Prometheus
metrics. You will need a local Prometheus and add Loki and Promtail as targets.
See &lt;a href=&#34;https://prometheus.io/docs/prometheus/latest/configuration/configuration&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;configuring
Prometheus&lt;/a&gt;
for more information.&lt;/p&gt;
&lt;p&gt;All components of Loki expose the following metrics:&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Metric Name&lt;/th&gt;
              &lt;th&gt;Metric Type&lt;/th&gt;
              &lt;th&gt;Description&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;log_messages_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Total number of messages logged by Loki.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_request_duration_seconds&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Number of received HTTP requests.&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;p&gt;The Loki Distributors expose the following metrics:&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Metric Name&lt;/th&gt;
              &lt;th&gt;Metric Type&lt;/th&gt;
              &lt;th&gt;Description&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_distributor_ingester_appends_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of batch appends sent to ingesters.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_distributor_ingester_append_failures_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of failed batch appends sent to ingesters.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_distributor_bytes_received_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of uncompressed bytes received per both tenant and retention hours.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_distributor_lines_received_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of log &lt;em&gt;entries&lt;/em&gt; received per tenant (not necessarily of &lt;em&gt;lines&lt;/em&gt;, as an entry can have more than one line of text).&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;p&gt;The Loki Ingesters expose the following metrics:&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Metric Name&lt;/th&gt;
              &lt;th&gt;Metric Type&lt;/th&gt;
              &lt;th&gt;Description&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;cortex_ingester_flush_queue_length&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Gauge&lt;/td&gt;
              &lt;td&gt;The total number of series pending in the flush queue.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_chunk_store_index_entries_per_chunk&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Number of index entries written to storage per chunk.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_memory_chunks&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Gauge&lt;/td&gt;
              &lt;td&gt;The total number of chunks in memory.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_memory_streams&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Gauge&lt;/td&gt;
              &lt;td&gt;The total number of streams in memory.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunk_age_seconds&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Distribution of chunk ages when flushed.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunk_encode_time_seconds&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Distribution of chunk encode times.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunk_entries&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Distribution of lines per-chunk when flushed.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunk_size_bytes&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Distribution of chunk sizes when flushed.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunk_utilization&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Distribution of chunk utilization (filled uncompressed bytes vs maximum uncompressed bytes) when flushed.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunk_compression_ratio&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Distribution of chunk compression ratio when flushed.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunk_stored_bytes_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Total bytes stored in chunks per tenant.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunks_created_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of chunks created in the ingester.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_chunks_stored_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Total stored chunks per tenant.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_received_chunks&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of chunks sent by this ingester whilst joining during the handoff process.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_samples_per_chunk&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;The number of samples in a chunk.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_sent_chunks&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of chunks sent by this ingester whilst leaving during the handoff process.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_streams_created_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of streams created per tenant.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;loki_ingester_streams_removed_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;The total number of streams removed per tenant.&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;p&gt;Promtail exposes these metrics:&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Metric Name&lt;/th&gt;
              &lt;th&gt;Metric Type&lt;/th&gt;
              &lt;th&gt;Description&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_read_bytes_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Gauge&lt;/td&gt;
              &lt;td&gt;Number of bytes read.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_read_lines_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Number of lines read.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_dropped_bytes_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Number of bytes dropped because failed to be sent to the ingester after all retries.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_dropped_entries_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Number of log entries dropped because failed to be sent to the ingester after all retries.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_encoded_bytes_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Number of bytes encoded and ready to send.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_file_bytes_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Gauge&lt;/td&gt;
              &lt;td&gt;Number of bytes read from files.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_files_active_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Gauge&lt;/td&gt;
              &lt;td&gt;Number of active files.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_log_entries_bytes&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;The total count of bytes read.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_request_duration_seconds_count&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Histogram&lt;/td&gt;
              &lt;td&gt;Number of send requests.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_sent_bytes_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Number of bytes sent.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_sent_entries_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Number of log entries sent to the ingester.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_targets_active_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Gauge&lt;/td&gt;
              &lt;td&gt;Number of total active targets.&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;promtail_targets_failed_total&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;Counter&lt;/td&gt;
              &lt;td&gt;Number of total failed targets.&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;p&gt;Most of these metrics are counters and should continuously increase during normal operations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Your app emits a log line to a file that is tracked by Promtail.&lt;/li&gt;
&lt;li&gt;Promtail reads the new line and increases its counters.&lt;/li&gt;
&lt;li&gt;Promtail forwards the log line to a Loki distributor, where the received
counters should increase.&lt;/li&gt;
&lt;li&gt;The Loki distributor forwards the log line to a Loki ingester, where the
request duration counter should increase.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If Promtail uses any pipelines with metrics stages, those metrics will also be
exposed by Promtail at its &lt;code&gt;/metrics&lt;/code&gt; endpoint. See Promtail&amp;rsquo;s documentation on
&lt;a href=&#34;../../clients/promtail/pipelines/&#34;&gt;Pipelines&lt;/a&gt; for more information.&lt;/p&gt;
&lt;p&gt;An example Grafana dashboard was built by the community and is available as
dashboard &lt;a href=&#34;/dashboards/10004&#34;&gt;10004&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;mixins&#34;&gt;Mixins&lt;/h2&gt;
&lt;p&gt;The Loki repository has a &lt;a href=&#34;https://github.com/grafana/loki/blob/master/production/loki-mixin&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;mixin&lt;/a&gt; that includes a
set of dashboards, recording rules, and alerts. Together, the mixin gives you a
comprehensive package for monitoring Loki in production.&lt;/p&gt;
&lt;p&gt;For more information about mixins, take a look at the docs for the
&lt;a href=&#34;https://github.com/monitoring-mixins/docs&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;monitoring-mixins project&lt;/a&gt;.&lt;/p&gt;
]]></content><description>&lt;h1 id="observing-grafana-loki">Observing Grafana Loki&lt;/h1>
&lt;p>Both Grafana Loki and Promtail expose a &lt;code>/metrics&lt;/code> endpoint that expose Prometheus
metrics. You will need a local Prometheus and add Loki and Promtail as targets.
See &lt;a href="https://prometheus.io/docs/prometheus/latest/configuration/configuration" target="_blank" rel="noopener noreferrer">configuring
Prometheus&lt;/a>
for more information.&lt;/p></description></item><item><title>Overrides Exporter</title><link>https://grafana.com/docs/loki/v2.4.x/operations/overrides-exporter/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/overrides-exporter/</guid><content><![CDATA[&lt;p&gt;Loki is a multi-tenant system that supports applying limits to each tenant as a mechanism for resource management. The &lt;code&gt;overrides-exporter&lt;/code&gt; module exposes these limits as Prometheus metrics in order to help operators better understand tenant behavior.&lt;/p&gt;
&lt;h2 id=&#34;context&#34;&gt;Context&lt;/h2&gt;
&lt;p&gt;Configuration updates to tenant limits can be applied to Loki without restart via the &lt;a href=&#34;../configuration/#runtime-configuration-file&#34;&gt;&lt;code&gt;runtime_config&lt;/code&gt;&lt;/a&gt; feature.&lt;/p&gt;
&lt;h2 id=&#34;example&#34;&gt;Example&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;overrides-exporter&lt;/code&gt; module is disabled by default. We recommend running a single instance per cluster to avoid issues with metric cardinality. The &lt;code&gt;overrides-exporter&lt;/code&gt; creates one metric for every scalar field in the limits configuration under the metric &lt;code&gt;loki_overrides_defaults&lt;/code&gt; with the default value for that field after loading the Loki configuration. It also exposes another metric for &lt;em&gt;every&lt;/em&gt; differing field for &lt;em&gt;every&lt;/em&gt; tenant.&lt;/p&gt;
&lt;p&gt;Using an example &lt;code&gt;runtime.yaml&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  &amp;#34;tenant_1&amp;#34;:
    ingestion_rate_mb: 10
    max_streams_per_user: 100000
    max_chunks_per_query: 100000&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Launch an instance of the &lt;code&gt;overrides-exporter&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;shell&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-shell&#34;&gt;loki -target=overrides-exporter -runtime-config.file=runtime.yaml -config.file=basic_schema_config.yaml -server.http-listen-port=8080&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To inspect the tenant limit overrides:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;shell&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-shell&#34;&gt;$ curl -sq localhost:8080/metrics | grep override
# HELP loki_overrides Resource limit overrides applied to tenants
# TYPE loki_overrides gauge
loki_overrides{limit_name=&amp;#34;ingestion_rate_mb&amp;#34;,user=&amp;#34;tenant_1&amp;#34;} 10
loki_overrides{limit_name=&amp;#34;max_chunks_per_query&amp;#34;,user=&amp;#34;tenant_1&amp;#34;} 100000
loki_overrides{limit_name=&amp;#34;max_streams_per_user&amp;#34;,user=&amp;#34;tenant_1&amp;#34;} 100000
# HELP loki_overrides_defaults Default values for resource limit overrides applied to tenants
# TYPE loki_overrides_defaults gauge
loki_overrides_defaults{limit_name=&amp;#34;cardinality_limit&amp;#34;} 100000
loki_overrides_defaults{limit_name=&amp;#34;creation_grace_period&amp;#34;} 6e&amp;#43;11
loki_overrides_defaults{limit_name=&amp;#34;ingestion_burst_size_mb&amp;#34;} 6
loki_overrides_defaults{limit_name=&amp;#34;ingestion_rate_mb&amp;#34;} 4
loki_overrides_defaults{limit_name=&amp;#34;max_cache_freshness_per_query&amp;#34;} 6e&amp;#43;10
loki_overrides_defaults{limit_name=&amp;#34;max_chunks_per_query&amp;#34;} 2e&amp;#43;06
loki_overrides_defaults{limit_name=&amp;#34;max_concurrent_tail_requests&amp;#34;} 10
loki_overrides_defaults{limit_name=&amp;#34;max_entries_limit_per_query&amp;#34;} 5000
loki_overrides_defaults{limit_name=&amp;#34;max_global_streams_per_user&amp;#34;} 5000
loki_overrides_defaults{limit_name=&amp;#34;max_label_name_length&amp;#34;} 1024
loki_overrides_defaults{limit_name=&amp;#34;max_label_names_per_series&amp;#34;} 30
loki_overrides_defaults{limit_name=&amp;#34;max_label_value_length&amp;#34;} 2048
loki_overrides_defaults{limit_name=&amp;#34;max_line_size&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;max_queriers_per_tenant&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;max_query_length&amp;#34;} 2.5956e&amp;#43;15
loki_overrides_defaults{limit_name=&amp;#34;max_query_lookback&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;max_query_parallelism&amp;#34;} 32
loki_overrides_defaults{limit_name=&amp;#34;max_query_series&amp;#34;} 500
loki_overrides_defaults{limit_name=&amp;#34;max_streams_matchers_per_query&amp;#34;} 1000
loki_overrides_defaults{limit_name=&amp;#34;max_streams_per_user&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;min_sharding_lookback&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;per_stream_rate_limit&amp;#34;} 3.145728e&amp;#43;06
loki_overrides_defaults{limit_name=&amp;#34;per_stream_rate_limit_burst&amp;#34;} 1.572864e&amp;#43;07
loki_overrides_defaults{limit_name=&amp;#34;per_tenant_override_period&amp;#34;} 1e&amp;#43;10
loki_overrides_defaults{limit_name=&amp;#34;reject_old_samples_max_age&amp;#34;} 1.2096e&amp;#43;15
loki_overrides_defaults{limit_name=&amp;#34;retention_period&amp;#34;} 2.6784e&amp;#43;15
loki_overrides_defaults{limit_name=&amp;#34;ruler_evaluation_delay_duration&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_max_rule_groups_per_tenant&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_max_rules_per_rule_group&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_batch_send_deadline&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_capacity&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_max_backoff&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_max_samples_per_send&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_max_shards&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_min_backoff&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_queue_min_shards&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;ruler_remote_write_timeout&amp;#34;} 0
loki_overrides_defaults{limit_name=&amp;#34;split_queries_by_interval&amp;#34;} 0&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Alerts can be created based on these metrics to inform operators when tenants are close to hitting their limits allowing for increases to be applied before the tenant limits are exceeded.&lt;/p&gt;
]]></content><description>&lt;p>Loki is a multi-tenant system that supports applying limits to each tenant as a mechanism for resource management. The &lt;code>overrides-exporter&lt;/code> module exposes these limits as Prometheus metrics in order to help operators better understand tenant behavior.&lt;/p></description></item><item><title>Scalability</title><link>https://grafana.com/docs/loki/v2.4.x/operations/scalability/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/scalability/</guid><content><![CDATA[&lt;h1 id=&#34;scaling-with-grafana-loki&#34;&gt;Scaling with Grafana Loki&lt;/h1&gt;
&lt;p&gt;See &lt;a href=&#34;/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/&#34;&gt;Loki: Prometheus-inspired, open source logging for cloud natives&lt;/a&gt;
for a discussion about Grafana Loki&amp;rsquo;s scalability.&lt;/p&gt;
&lt;p&gt;When scaling Loki, operators should consider running several Loki processes
partitioned by role (ingester, distributor, querier) rather than a single Loki
process. Grafana Labs&amp;rsquo; &lt;a href=&#34;https://github.com/grafana/loki/blob/master/production/ksonnet/loki&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;production setup&lt;/a&gt;
contains &lt;code&gt;.libsonnet&lt;/code&gt; files that demonstrates configuring separate components
and scaling for resource usage.&lt;/p&gt;
&lt;h2 id=&#34;separate-query-scheduler&#34;&gt;Separate Query Scheduler&lt;/h2&gt;
&lt;p&gt;The Query frontend has an in-memory queue that can be moved out into a separate process similar to the &lt;a href=&#34;https://cortexmetrics.io/docs/operations/scaling-query-frontend/#query-scheduler&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Cortex Query Scheduler&lt;/a&gt;. This allows running multiple query frontends.&lt;/p&gt;
&lt;p&gt;In order to run with the Query Scheduler, the frontend needs to be passed the scheduler&amp;rsquo;s address via &lt;code&gt;-frontend.scheduler-address&lt;/code&gt; and the querier processes needs to be started with &lt;code&gt;-querier.scheduler-address&lt;/code&gt; set to the same address. Both options can also be defined via the &lt;a href=&#34;../configuration&#34;&gt;configuration file&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It is not valid to start the querier with both a configured frontend and a scheduler address.&lt;/p&gt;
&lt;p&gt;The query scheduler process itself can be started via the &lt;code&gt;-target=query-scheduler&lt;/code&gt; option of the Loki Docker image. For instance, &lt;code&gt;docker run grafana/loki:latest -config.file=/cortex/config/cortex.yaml -target=query-scheduler -server.http-listen-port=8009 -server.grpc-listen-port=9009&lt;/code&gt; starts the query scheduler listening on ports &lt;code&gt;8009&lt;/code&gt; and &lt;code&gt;9009&lt;/code&gt;.&lt;/p&gt;
]]></content><description>&lt;h1 id="scaling-with-grafana-loki">Scaling with Grafana Loki&lt;/h1>
&lt;p>See &lt;a href="/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/">Loki: Prometheus-inspired, open source logging for cloud natives&lt;/a>
for a discussion about Grafana Loki&amp;rsquo;s scalability.&lt;/p>
&lt;p>When scaling Loki, operators should consider running several Loki processes
partitioned by role (ingester, distributor, querier) rather than a single Loki
process. Grafana Labs&amp;rsquo; &lt;a href="https://github.com/grafana/loki/blob/master/production/ksonnet/loki" target="_blank" rel="noopener noreferrer">production setup&lt;/a>
contains &lt;code>.libsonnet&lt;/code> files that demonstrates configuring separate components
and scaling for resource usage.&lt;/p></description></item><item><title>Storage</title><link>https://grafana.com/docs/loki/v2.4.x/operations/storage/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/storage/</guid><content><![CDATA[&lt;h1 id=&#34;grafana-loki-storage&#34;&gt;Grafana Loki Storage&lt;/h1&gt;
&lt;p&gt;&lt;a href=&#34;../../storage/&#34;&gt;High level storage overview here&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Grafana Loki needs to store two different types of data: &lt;strong&gt;chunks&lt;/strong&gt; and &lt;strong&gt;indexes&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Loki receives logs in separate streams, where each stream is uniquely identified
by its tenant ID and its set of labels. As log entries from a stream arrive,
they are compressed as &amp;ldquo;chunks&amp;rdquo; and saved in the chunks store. See &lt;a href=&#34;#chunk-format&#34;&gt;chunk
format&lt;/a&gt; for how chunks are stored internally.&lt;/p&gt;
&lt;p&gt;The &lt;strong&gt;index&lt;/strong&gt; stores each stream&amp;rsquo;s label set and links them to the individual
chunks.&lt;/p&gt;
&lt;p&gt;Refer to Loki&amp;rsquo;s &lt;a href=&#34;../../configuration/&#34;&gt;configuration&lt;/a&gt; for details on
how to configure the storage and the index.&lt;/p&gt;
&lt;p&gt;For more information:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&#34;table-manager/&#34;&gt;Table Manager&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;retention/&#34;&gt;Retention&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;logs-deletion/&#34;&gt;Logs Deletion&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;supported-stores&#34;&gt;Supported Stores&lt;/h2&gt;
&lt;p&gt;The following are supported for the index:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;boltdb-shipper/&#34;&gt;Single Store (boltdb-shipper) - Recommended for 2.0 and newer&lt;/a&gt; index store which stores boltdb index files in the object store&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://aws.amazon.com/dynamodb&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Amazon DynamoDB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cloud.google.com/bigtable&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Google Bigtable&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cassandra.apache.org&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Apache Cassandra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://github.com/boltdb/bolt&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;BoltDB&lt;/a&gt; (doesn&amp;rsquo;t work when clustering Loki)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following are supported for the chunks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://aws.amazon.com/dynamodb&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Amazon DynamoDB&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cloud.google.com/bigtable&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Google Bigtable&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cassandra.apache.org&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Apache Cassandra&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://aws.amazon.com/s3&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Amazon S3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://cloud.google.com/storage/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Google Cloud Storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;filesystem/&#34;&gt;Filesystem&lt;/a&gt; (please read more about the filesystem to understand the pros/cons before using with production data)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;cloud-storage-permissions&#34;&gt;Cloud Storage Permissions&lt;/h2&gt;
&lt;h3 id=&#34;s3&#34;&gt;S3&lt;/h3&gt;
&lt;p&gt;When using S3 as object storage, the following permissions are needed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;s3:ListBucket&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;s3:PutObject&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;s3:GetObject&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;s3:DeleteObject&lt;/code&gt; (if running the Single Store (boltdb-shipper) compactor)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;arn:aws:s3:::&amp;lt;bucket_name&amp;gt;&lt;/code&gt;, &lt;code&gt;arn:aws:s3:::&amp;lt;bucket_name&amp;gt;/*&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&#34;dynamodb&#34;&gt;DynamoDB&lt;/h3&gt;
&lt;p&gt;When using DynamoDB for the index, the following permissions are needed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dynamodb:BatchGetItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:BatchWriteItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:DeleteItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:DescribeTable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:GetItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:ListTagsOfResource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:PutItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:Query&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:TagResource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:UntagResource&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:UpdateItem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:UpdateTable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:CreateTable&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dynamodb:DeleteTable&lt;/code&gt; (if &lt;code&gt;table_manager.retention_period&lt;/code&gt; is more than 0s)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;arn:aws:dynamodb:&amp;lt;aws_region&amp;gt;:&amp;lt;aws_account_id&amp;gt;:table/&amp;lt;prefix&amp;gt;*&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;dynamodb:ListTables&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;*&lt;/code&gt;&lt;/p&gt;
&lt;h4 id=&#34;autoscaling&#34;&gt;AutoScaling&lt;/h4&gt;
&lt;p&gt;If you enable autoscaling from table manager, the following permissions are needed:&lt;/p&gt;
&lt;h5 id=&#34;application-autoscaling&#34;&gt;Application Autoscaling&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DescribeScalableTargets&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DescribeScalingPolicies&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:RegisterScalableTarget&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DeregisterScalableTarget&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:PutScalingPolicy&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;application-autoscaling:DeleteScalingPolicy&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;*&lt;/code&gt;&lt;/p&gt;
&lt;h5 id=&#34;iam&#34;&gt;IAM&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;iam:GetRole&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;iam:PassRole&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Resources: &lt;code&gt;arn:aws:iam::&amp;lt;aws_account_id&amp;gt;:role/&amp;lt;role_name&amp;gt;&lt;/code&gt;&lt;/p&gt;
&lt;h2 id=&#34;chunk-format&#34;&gt;Chunk Format&lt;/h2&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;  -------------------------------------------------------------------
  |                               |                                 |
  |        MagicNumber(4b)        |           version(1b)           |
  |                               |                                 |
  -------------------------------------------------------------------
  |         block-1 bytes         |          checksum (4b)          |
  -------------------------------------------------------------------
  |         block-2 bytes         |          checksum (4b)          |
  -------------------------------------------------------------------
  |         block-n bytes         |          checksum (4b)          |
  -------------------------------------------------------------------
  |                        #blocks (uvarint)                        |
  -------------------------------------------------------------------
  | #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
  -------------------------------------------------------------------
  | #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
  -------------------------------------------------------------------
  | #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
  -------------------------------------------------------------------
  | #entries(uvarint) | mint, maxt (varint) | offset, len (uvarint) |
  -------------------------------------------------------------------
  |                      checksum(from #blocks)                     |
  -------------------------------------------------------------------
  |           metasOffset - offset to the point with #blocks        |
  -------------------------------------------------------------------&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="grafana-loki-storage">Grafana Loki Storage&lt;/h1>
&lt;p>&lt;a href="../../storage/">High level storage overview here&lt;/a>&lt;/p>
&lt;p>Grafana Loki needs to store two different types of data: &lt;strong>chunks&lt;/strong> and &lt;strong>indexes&lt;/strong>.&lt;/p>
&lt;p>Loki receives logs in separate streams, where each stream is uniquely identified
by its tenant ID and its set of labels. As log entries from a stream arrive,
they are compressed as &amp;ldquo;chunks&amp;rdquo; and saved in the chunks store. See &lt;a href="#chunk-format">chunk
format&lt;/a> for how chunks are stored internally.&lt;/p></description></item><item><title>Multi-tenancy</title><link>https://grafana.com/docs/loki/v2.4.x/operations/multi-tenancy/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/multi-tenancy/</guid><content><![CDATA[&lt;h1 id=&#34;grafana-loki-multi-tenancy&#34;&gt;Grafana Loki Multi-Tenancy&lt;/h1&gt;
&lt;p&gt;Grafana Loki is a multi-tenant system; requests and data for tenant A are isolated from
tenant B. Requests to the Loki API should include an HTTP header
(&lt;code&gt;X-Scope-OrgID&lt;/code&gt;) that identifies the tenant for the request.&lt;/p&gt;
&lt;p&gt;Tenant IDs can be any alphanumeric string that fits within the Go HTTP header
limit (1MB). Operators are recommended to use a reasonable limit for uniquely
identifying tenants; 20 bytes is usually enough.&lt;/p&gt;
&lt;p&gt;To run in multi-tenant mode, Loki should be started with &lt;code&gt;auth_enabled: true&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Loki can be run in &amp;ldquo;single-tenant&amp;rdquo; mode where the &lt;code&gt;X-Scope-OrgID&lt;/code&gt; header is not
required. In single-tenant mode, the tenant ID defaults to &lt;code&gt;fake&lt;/code&gt;.&lt;/p&gt;
]]></content><description>&lt;h1 id="grafana-loki-multi-tenancy">Grafana Loki Multi-Tenancy&lt;/h1>
&lt;p>Grafana Loki is a multi-tenant system; requests and data for tenant A are isolated from
tenant B. Requests to the Loki API should include an HTTP header
(&lt;code>X-Scope-OrgID&lt;/code>) that identifies the tenant for the request.&lt;/p></description></item><item><title>Loki Canary</title><link>https://grafana.com/docs/loki/v2.4.x/operations/loki-canary/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/loki-canary/</guid><content><![CDATA[&lt;h1 id=&#34;loki-canary&#34;&gt;Loki Canary&lt;/h1&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;../canary.png&#34;
  alt=&#34;canary&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Loki Canary is a standalone app that audits the log-capturing performance of
a Grafana Loki cluster.&lt;/p&gt;
&lt;p&gt;Loki Canary generates artificial log lines.
These log lines are sent to the Loki cluster.
Loki Canary communicates with the Loki cluster to capture metrics about the
artificial log lines,
such that Loki Canary forms information about the performance of the
Loki cluster.
The information is available as Prometheus time series metrics.&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;../loki-canary-block.png&#34;
  alt=&#34;block_diagram&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Loki Canary writes a log to a file and stores the timestamp in an internal
array. The contents look something like this:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;nohighlight&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-nohighlight&#34;&gt;1557935669096040040 ppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The relevant part of the log entry is the timestamp; the &lt;code&gt;p&lt;/code&gt;s are just filler
bytes to make the size of the log configurable.&lt;/p&gt;
&lt;p&gt;An agent (like Promtail) should be configured to read the log file and ship it
to Loki.&lt;/p&gt;
&lt;p&gt;Meanwhile, Loki Canary will open a WebSocket connection to Loki and will tail
the logs it creates. When a log is received on the WebSocket, the timestamp
in the log message is compared to the internal array.&lt;/p&gt;
&lt;p&gt;If the received log is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The next in the array to be received, it is removed from the array and the
(current time - log timestamp) is recorded in the &lt;code&gt;response_latency&lt;/code&gt;
histogram. This is the expected behavior for well behaving logs.&lt;/li&gt;
&lt;li&gt;Not the next in the array to be received, it is removed from the array, the
response time is recorded in the &lt;code&gt;response_latency&lt;/code&gt; histogram, and the
&lt;code&gt;out_of_order_entries&lt;/code&gt; counter is incremented.&lt;/li&gt;
&lt;li&gt;Not in the array at all, it is checked against a separate list of received
logs to either increment the &lt;code&gt;duplicate_entries&lt;/code&gt; counter or the
&lt;code&gt;unexpected_entries&lt;/code&gt; counter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the background, Loki Canary also runs a timer which iterates through all of
the entries in the internal array. If any of the entries are older than the
duration specified by the &lt;code&gt;-wait&lt;/code&gt; flag (defaulting to 60s), they are removed
from the array and the &lt;code&gt;websocket_missing_entries&lt;/code&gt; counter is incremented. An
additional query is then made directly to Loki for any missing entries to
determine if they are truly missing or only missing from the WebSocket. If
missing entries are not found in the direct query, the &lt;code&gt;missing_entries&lt;/code&gt; counter
is incremented.&lt;/p&gt;
&lt;h3 id=&#34;additional-queries&#34;&gt;Additional Queries&lt;/h3&gt;
&lt;h4 id=&#34;spot-check&#34;&gt;Spot Check&lt;/h4&gt;
&lt;p&gt;Starting with version 1.6.0, the canary will spot check certain results over time
to make sure they are present in Loki, this is helpful for testing the transition
of inmemory logs in the ingester to the store to make sure nothing is lost.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;-spot-check-interval&lt;/code&gt; and &lt;code&gt;-spot-check-max&lt;/code&gt; are used to tune this feature,
&lt;code&gt;-spot-check-interval&lt;/code&gt; will pull a log entry from the stream at this interval
and save it in a separate list up to &lt;code&gt;-spot-check-max&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Every &lt;code&gt;-spot-check-query-rate&lt;/code&gt;, Loki will be queried for each entry in this list and
&lt;code&gt;loki_canary_spot_check_entries_total&lt;/code&gt; will be incremented, if a result
is missing &lt;code&gt;loki_canary_spot_check_missing_entries_total&lt;/code&gt; will be incremented.&lt;/p&gt;
&lt;p&gt;The defaults of &lt;code&gt;15m&lt;/code&gt; for &lt;code&gt;spot-check-interval&lt;/code&gt; and &lt;code&gt;4h&lt;/code&gt; for &lt;code&gt;spot-check-max&lt;/code&gt;
means that after 4 hours of running the canary will have a list of 16 entries
it will query every minute (default &lt;code&gt;spot-check-query-rate&lt;/code&gt; interval is 1m),
so be aware of the query load this can put on Loki if you have a lot of canaries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; if you are using &lt;code&gt;out-of-order-percentage&lt;/code&gt; to test ingestion of out-of-order
log lines be sure not to set the two out of order time range flags too far in the past.
The defaults are already enough to test this functionality properly, and setting them
too far in the past can cause issues with the spot check test.&lt;/p&gt;
&lt;p&gt;When using &lt;code&gt;out-of-order-percentage&lt;/code&gt; you also need to make use of pipeline stages
in your Promtail configuration in order to set the timestamps correctly as the logs are pushed
to Loki. The &lt;code&gt;client/promtail/pipelines&lt;/code&gt; docs have examples of how to do this.&lt;/p&gt;
&lt;h4 id=&#34;metric-test&#34;&gt;Metric Test&lt;/h4&gt;
&lt;p&gt;Loki Canary will run a metric query &lt;code&gt;count_over_time&lt;/code&gt; to
verify that the rate of logs being stored in Loki corresponds to the rate they are being
created by Loki Canary.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;-metric-test-interval&lt;/code&gt; and &lt;code&gt;-metric-test-range&lt;/code&gt; are used to tune this feature, but
by default every &lt;code&gt;15m&lt;/code&gt; the canary will run a &lt;code&gt;count_over_time&lt;/code&gt; instant-query to Loki
for a range of &lt;code&gt;24h&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If the canary has not run for &lt;code&gt;-metric-test-range&lt;/code&gt; (&lt;code&gt;24h&lt;/code&gt;) the query range is adjusted
to the amount of time the canary has been running such that the rate can be calculated
since the canary was started.&lt;/p&gt;
&lt;p&gt;The canary calculates what the expected count of logs would be for the range
(also adjusting this based on canary runtime) and compares the expected result with
the actual result returned from Loki.  The &lt;em&gt;difference&lt;/em&gt; is stored as the value in
the gauge &lt;code&gt;loki_canary_metric_test_deviation&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s expected that there will be some deviation, the method of creating an expected
calculation based on the query rate compared to actual query data is imperfect
and will lead to a deviation of a few log entries.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not expected for there to be a deviation of more than 3-4 log entries.&lt;/p&gt;
&lt;h3 id=&#34;control&#34;&gt;Control&lt;/h3&gt;
&lt;p&gt;Loki Canary responds to two endpoints to allow dynamic suspending/resuming of the
canary process.  This can be useful if you&amp;rsquo;d like to quickly disable or reenable the
canary.  To stop or start the canary issue an HTTP GET request against the &lt;code&gt;/suspend&lt;/code&gt; or
&lt;code&gt;/resume&lt;/code&gt; endpoints.&lt;/p&gt;
&lt;h2 id=&#34;installation&#34;&gt;Installation&lt;/h2&gt;
&lt;h3 id=&#34;binary&#34;&gt;Binary&lt;/h3&gt;
&lt;p&gt;Loki Canary is provided as a pre-compiled binary as part of the
&lt;a href=&#34;https://github.com/grafana/loki/releases&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;Loki Releases&lt;/a&gt; on GitHub.&lt;/p&gt;
&lt;h3 id=&#34;docker&#34;&gt;Docker&lt;/h3&gt;
&lt;p&gt;Loki Canary is also provided as a Docker container image:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;# change tag to the most recent release
$ docker pull grafana/loki-canary:2.0.0&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;kubernetes&#34;&gt;Kubernetes&lt;/h3&gt;
&lt;p&gt;To run on Kubernetes, you can do something simple like:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;kubectl run loki-canary --generator=run-pod/v1 --image=grafana/loki-canary:latest --restart=Never --image-pull-policy=IfNotPresent --labels=name=loki-canary -- -addr=loki:3100&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Or you can do something more complex like deploy it as a DaemonSet, there is a
Tanka setup for this in the &lt;code&gt;production&lt;/code&gt; folder, you can import it using
&lt;code&gt;jsonnet-bundler&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;shell&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-shell&#34;&gt;jb install github.com/grafana/loki-canary/production/ksonnet/loki-canary&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Then in your Tanka environment&amp;rsquo;s &lt;code&gt;main.jsonnet&lt;/code&gt; you&amp;rsquo;ll want something like
this:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;jsonnet&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-jsonnet&#34;&gt;local loki_canary = import &amp;#39;loki-canary/loki-canary.libsonnet&amp;#39;;

loki_canary {
  loki_canary_args&amp;#43;:: {
    addr: &amp;#34;loki:3100&amp;#34;,
    port: 80,
    labelname: &amp;#34;instance&amp;#34;,
    interval: &amp;#34;100ms&amp;#34;,
    size: 1024,
    wait: &amp;#34;3m&amp;#34;,
  },
  _config&amp;#43;:: {
    namespace: &amp;#34;default&amp;#34;,
  }
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h4 id=&#34;examples&#34;&gt;Examples&lt;/h4&gt;
&lt;p&gt;Standalone Pod Implementation of loki-canary&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;---
apiVersion: v1
kind: Pod
metadata:
  labels:
    app: loki-canary
    name: loki-canary
  name: loki-canary
spec:
  containers:
  - args:
    - -addr=loki:3100
    image: grafana/loki-canary:latest
    imagePullPolicy: IfNotPresent
    name: loki-canary
    resources: {}
---
apiVersion: v1
kind: Service
metadata:
  name: loki-canary
  labels:
    app: loki-canary
spec:
  type: ClusterIP
  selector:
    app: loki-canary
  ports:
  - name: metrics
    protocol: TCP
    port: 3500
    targetPort: 3500&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;DaemonSet Implementation of loki-canary&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;---
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
  labels:
    app: loki-canary
    name: loki-canary
  name: loki-canary
spec:
  template:
    metadata:
      name: loki-canary
      labels:
        app: loki-canary
    spec:
      containers:
      - args:
        - -addr=loki:3100
        image: grafana/loki-canary:latest
        imagePullPolicy: IfNotPresent
        name: loki-canary
        resources: {}
---
apiVersion: v1
kind: Service
metadata:
  name: loki-canary
  labels:
    app: loki-canary
spec:
  type: ClusterIP
  selector:
    app: loki-canary
  ports:
  - name: metrics
    protocol: TCP
    port: 3500
    targetPort: 3500&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;from-source&#34;&gt;From Source&lt;/h3&gt;
&lt;p&gt;If the other options are not sufficient for your use case, you can compile
&lt;code&gt;loki-canary&lt;/code&gt; yourself:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;# clone the source tree
$ git clone https://github.com/grafana/loki

# build the binary
$ make loki-canary

# (optionally build the container image)
$ make loki-canary-image&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;configuration&#34;&gt;Configuration&lt;/h2&gt;
&lt;p&gt;The address of Loki must be passed in with the &lt;code&gt;-addr&lt;/code&gt; flag, and if your Loki
server uses TLS, &lt;code&gt;-tls=true&lt;/code&gt; must also be provided. Note that using TLS will
cause the WebSocket connection to use &lt;code&gt;wss://&lt;/code&gt; instead of &lt;code&gt;ws://&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;-labelname&lt;/code&gt; and &lt;code&gt;-labelvalue&lt;/code&gt; flags should also be provided, as these are
used by Loki Canary to filter the log stream to only process logs for the
current instance of the canary. Ensure that the values provided to the flags are
unique to each instance of Loki Canary. Grafana Labs&amp;rsquo; Tanka config
accomplishes this by passing in the pod name as the label value.&lt;/p&gt;
&lt;p&gt;If Loki Canary reports a high number of &lt;code&gt;unexpected_entries&lt;/code&gt;, Loki Canary may
not be waiting long enough and the value for the &lt;code&gt;-wait&lt;/code&gt; flag should be
increased to a larger value than 60s.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Be aware&lt;/strong&gt; of the relationship between &lt;code&gt;pruneinterval&lt;/code&gt; and the &lt;code&gt;interval&lt;/code&gt;.
For example, with an interval of 10ms (100 logs per second) and a prune interval
of 60s, you will write 6000 logs per minute. If those logs were not received
over the WebSocket, the canary will attempt to query Loki directly to see if
they are completely lost. &lt;strong&gt;However&lt;/strong&gt; the query return is limited to 1000
results so you will not be able to return all the logs even if they did make it
to Loki.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Likewise&lt;/strong&gt;, if you lower the &lt;code&gt;pruneinterval&lt;/code&gt; you risk causing a denial of
service attack as all your canaries attempt to query for missing logs at
whatever your &lt;code&gt;pruneinterval&lt;/code&gt; is defined at.&lt;/p&gt;
&lt;p&gt;All options:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;  -addr string
        The Loki server URL:Port, e.g. loki:3100
  -buckets int
        Number of buckets in the response_latency histogram (default 10)
  -interval duration
        Duration between log entries (default 1s)
  -labelname string
        The label name for this instance of Loki Canary to use in the log selector
        (default &amp;#34;name&amp;#34;)
  -labelvalue string
        The unique label value for this instance of Loki Canary to use in the log selector
        (default &amp;#34;loki-canary&amp;#34;)
  -metric-test-interval duration
        The interval the metric test query should be run (default 1h0m0s)
  -metric-test-range duration
        The range value [24h] used in the metric test instant-query. This value is truncated
        to the running time of the canary until this value is reached (default 24h0m0s)
  -out-of-order-max duration
    	  Maximum amount of time (in seconds) in the past an out of order entry may have as a
          timestamp. (default 60s)
  -out-of-order-min duration
    	  Minimum amount of time (in seconds) in the past an out of order entry may have as a
          timestamp. (default 30s)
  -out-of-order-percentage int
      	Percentage (0-100) of log entries that should be sent out of order
  -pass string
        Loki password
  -port int
        Port which Loki Canary should expose metrics (default 3500)
  -pruneinterval duration
        Frequency to check sent versus received logs, and also the frequency at which queries
        for missing logs will be dispatched to Loki, and the frequency spot check queries are run
        (default 1m0s)
  -query-timeout duration
        How long to wait for a query response from Loki (default 10s)
  -size int
        Size in bytes of each log line (default 100)
  -spot-check-interval duration
        Interval that a single result will be kept from sent entries and spot-checked against
        Loki. For example, with the 15 minute default, one entry every 15 minutes will be saved,
        and then queried again every 15 minutes until the time defined by spot-check-max is
        reached (default 15m0s)
  -spot-check-max duration
        How far back to check a spot check an entry before dropping it (default 4h0m0s)
  -spot-check-query-rate duration
        Interval that Loki Canary will query Loki for the current list of all spot check entries
        (default 1m0s)
  -streamname string
        The stream name for this instance of Loki Canary to use in the log selector
        (default &amp;#34;stream&amp;#34;)
  -streamvalue string
        The unique stream value for this instance of Loki Canary to use in the log selector
        (default &amp;#34;stdout&amp;#34;)
  -tls
        Does the Loki connection use TLS?
  -user string
        Loki user name
  -version
        Print this build&amp;#39;s version information
  -wait duration
        Duration to wait for log entries before reporting them as lost (default 1m0s)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="loki-canary">Loki Canary&lt;/h1>
&lt;p>&lt;img
class="lazyload d-inline-block"
data-src="../canary.png"
alt="canary"/>&lt;/p>
&lt;p>Loki Canary is a standalone app that audits the log-capturing performance of
a Grafana Loki cluster.&lt;/p>
&lt;p>Loki Canary generates artificial log lines.
These log lines are sent to the Loki cluster.
Loki Canary communicates with the Loki cluster to capture metrics about the
artificial log lines,
such that Loki Canary forms information about the performance of the
Loki cluster.
The information is available as Prometheus time series metrics.&lt;/p></description></item><item><title>Recording Rules</title><link>https://grafana.com/docs/loki/v2.4.x/operations/recording-rules/</link><pubDate>Sat, 11 Apr 2026 09:32:03 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.4.x/operations/recording-rules/</guid><content><![CDATA[&lt;h1 id=&#34;recording-rules&#34;&gt;Recording Rules&lt;/h1&gt;
&lt;p&gt;Recording rules are evaluated by the &lt;code&gt;ruler&lt;/code&gt; component. Each &lt;code&gt;ruler&lt;/code&gt; acts as its own &lt;code&gt;querier&lt;/code&gt;, in the sense that it
executes queries against the store without using the &lt;code&gt;query-frontend&lt;/code&gt; or &lt;code&gt;querier&lt;/code&gt; components. It will respect all query
&lt;a href=&#34;/docs/loki/latest/configuration/#limits_config&#34;&gt;limits&lt;/a&gt; put in place for the &lt;code&gt;querier&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Loki&amp;rsquo;s implementation of recording rules largely reuses Prometheus&amp;rsquo; code.&lt;/p&gt;
&lt;p&gt;Samples generated by recording rules are sent to Prometheus using Prometheus&amp;rsquo; &lt;strong&gt;remote-write&lt;/strong&gt; feature.&lt;/p&gt;
&lt;h2 id=&#34;write-ahead-log-wal&#34;&gt;Write-Ahead Log (WAL)&lt;/h2&gt;
&lt;p&gt;All samples generated by recording rules are written to a WAL. The WAL&amp;rsquo;s main benefit is that it persists the samples
generated by recording rules to disk, which means that if your &lt;code&gt;ruler&lt;/code&gt; crashes, you won&amp;rsquo;t lose any data.
We are trading off extra memory usage and slower start-up times for this functionality.&lt;/p&gt;
&lt;p&gt;A WAL is created per tenant; this is done to prevent cross-tenant interactions. If all samples were to be written
to a single WAL, this would increase the chances that one tenant could cause data-loss for others. A typical scenario here
is that Prometheus will, for example, reject a remote-write request with 100 samples if just 1 of those samples is invalid in some way.&lt;/p&gt;
&lt;h3 id=&#34;start-up&#34;&gt;Start-up&lt;/h3&gt;
&lt;p&gt;When the &lt;code&gt;ruler&lt;/code&gt; starts up, it will load the WALs for the tenants who have recording rules. These WAL files are stored
on disk and are loaded into memory.&lt;/p&gt;
&lt;p&gt;Note: WALs are loaded one at a time upon start-up. This is a current limitation of the Cortex Ruler which Loki inherits.
For this reason, it is adviseable that the number of rule groups serviced by a ruler be kept to a reasonable size, since
&lt;em&gt;no rule evaluation occurs while WAL replay is in progress (this includes alerting rules)&lt;/em&gt;.&lt;/p&gt;
&lt;h3 id=&#34;truncation&#34;&gt;Truncation&lt;/h3&gt;
&lt;p&gt;WAL files are regularly truncated to reduce their size on disk.
&lt;a href=&#34;https://ganeshvernekar.com/blog/prometheus-tsdb-wal-and-checkpoint/#wal-truncation-and-checkpointing&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;This guide&lt;/a&gt;
from one of the Prometheus maintainers (Ganesh Vernekar) gives an excellent overview of the truncation, checkpointing,
and replaying of the WAL.&lt;/p&gt;
&lt;h3 id=&#34;cleaner&#34;&gt;Cleaner&lt;/h3&gt;
&lt;p&gt;&lt;span style=&#34;background-color:#f3f973;&#34;&gt;WAL Cleaner is an experimental feature.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;The WAL Cleaner watches for abandoned WALs (tenants who no longer have recording rules associated) and deletes them.
Enable this feature only if you are running into storage concerns with WALs that are too large. WALs should not grow
excessively large due to truncation.&lt;/p&gt;
&lt;h2 id=&#34;scaling&#34;&gt;Scaling&lt;/h2&gt;
&lt;p&gt;Loki&amp;rsquo;s &lt;code&gt;ruler&lt;/code&gt; component is based on Cortex&amp;rsquo;s &lt;code&gt;ruler&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;See Cortex&amp;rsquo;s guide for &lt;a href=&#34;https://cortexmetrics.io/docs/guides/ruler-sharding/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;horizontally scaling the &lt;code&gt;ruler&lt;/code&gt;&lt;/a&gt; using the ring.&lt;/p&gt;
&lt;p&gt;Note: the &lt;code&gt;ruler&lt;/code&gt; shards by rule &lt;em&gt;group&lt;/em&gt;, not by individual rules. This is an artifact of the fact that Prometheus
recording rules need to run in order since one recording rule can reuse another - but this is not possible in Loki.&lt;/p&gt;
&lt;h2 id=&#34;deployment&#34;&gt;Deployment&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;ruler&lt;/code&gt; needs to persist its WAL files to disk, and it incurs a bit of a start-up cost by reading these WALs into memory.
As such, it is recommended that you try to minimize churn of individual &lt;code&gt;ruler&lt;/code&gt; instances since rule evaluation is blocked
while the WALs are being read from disk.&lt;/p&gt;
&lt;h3 id=&#34;kubernetes&#34;&gt;Kubernetes&lt;/h3&gt;
&lt;p&gt;It is recommended that you run the &lt;code&gt;rulers&lt;/code&gt; using &lt;code&gt;StatefulSets&lt;/code&gt;. The &lt;code&gt;ruler&lt;/code&gt; will write its WAL files to persistent storage,
so a &lt;code&gt;Persistent Volume&lt;/code&gt; should be utilised.&lt;/p&gt;
&lt;h2 id=&#34;remote-write&#34;&gt;Remote-Write&lt;/h2&gt;
&lt;h3 id=&#34;per-tenant-limits&#34;&gt;Per-Tenant Limits&lt;/h3&gt;
&lt;p&gt;Remote-write can be configured at a global level in the base configuration, and certain parameters tuned specifically on
a per-tenant basis. Most of the configuration options &lt;a href=&#34;../../configuration/#ruler_config&#34;&gt;defined here&lt;/a&gt;
have &lt;a href=&#34;../../configuration/#limits_config&#34;&gt;override options&lt;/a&gt; (which can be also applied at runtime!).&lt;/p&gt;
&lt;h3 id=&#34;tuning&#34;&gt;Tuning&lt;/h3&gt;
&lt;p&gt;Remote-write can be tuned if the default configuration is insufficient (see &lt;a href=&#34;#failure-modes&#34;&gt;Failure Modes&lt;/a&gt; below).&lt;/p&gt;
&lt;p&gt;There is a &lt;a href=&#34;https://prometheus.io/docs/practices/remote_write/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;guide&lt;/a&gt; on the Prometheus website, all of which applies to Loki, too.&lt;/p&gt;
&lt;h2 id=&#34;observability&#34;&gt;Observability&lt;/h2&gt;
&lt;p&gt;Since Loki reuses the Prometheus code for recording rules and WALs, it also gains all of Prometheus&amp;rsquo; observability.&lt;/p&gt;
&lt;p&gt;Prometheus exposes a number of metrics for its WAL implementation, and these have all been prefixed with &lt;code&gt;loki_ruler_wal_&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For example: &lt;code&gt;prometheus_remote_storage_bytes_total&lt;/code&gt; → &lt;code&gt;loki_ruler_wal_prometheus_remote_storage_bytes_total&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Additional metrics are exposed, also with the prefix &lt;code&gt;loki_ruler_wal_&lt;/code&gt;. All per-tenant metrics contain a &lt;code&gt;tenant&lt;/code&gt;
label, so be aware that cardinality could begin to be a concern if the number of tenants grows sufficiently large.&lt;/p&gt;
&lt;p&gt;Some key metrics to note are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_appender_ready&lt;/code&gt;: whether a WAL appender is ready to accept samples (1) or not (0)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_total&lt;/code&gt;: number of samples sent per tenant to remote storage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples...&lt;/code&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_pending_total&lt;/code&gt;: samples buffered in memory, waiting to be sent to remote storage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_failed_total&lt;/code&gt;: samples that failed when sent to remote storage&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_dropped_total&lt;/code&gt;: samples dropped by relabel configurations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_samples_retried_total&lt;/code&gt;: samples re-resent to remote storage&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds&lt;/code&gt;: highest timestamp of sample appended to WAL&lt;/li&gt;
&lt;li&gt;&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds&lt;/code&gt;: highest timestamp of sample sent to remote storage.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We&amp;rsquo;ve created a basic &lt;a href=&#34;https://github.com/grafana/loki/tree/main/production/loki-mixin/dashboards/recording-rules.libsonnet&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;dashboard in our loki-mixin&lt;/a&gt;
which you can use to administer recording rules.&lt;/p&gt;
&lt;h2 id=&#34;failure-modes&#34;&gt;Failure Modes&lt;/h2&gt;
&lt;h3 id=&#34;remote-write-lagging&#34;&gt;Remote-Write Lagging&lt;/h3&gt;
&lt;p&gt;Remote-write can lag behind for many reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Remote-write storage (Prometheus) is temporarily unavailable&lt;/li&gt;
&lt;li&gt;A tenant is producing samples too quickly from a recording rule&lt;/li&gt;
&lt;li&gt;Remote-write is tuned too low, creating backpressure&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;It can be determined by subtracting
&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds&lt;/code&gt; from
&lt;code&gt;loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In case 1, the &lt;code&gt;ruler&lt;/code&gt; will continue to retry sending these samples until the remote storage becomes available again. Be
aware that if the remote storage is down for longer than &lt;code&gt;ruler.wal.max-age&lt;/code&gt;, data loss may occur after truncation occurs.&lt;/p&gt;
&lt;p&gt;In cases 2 &amp;amp; 3, you should consider &lt;a href=&#34;#tuning&#34;&gt;tuning&lt;/a&gt; remote-write appropriately.&lt;/p&gt;
&lt;p&gt;Further reading: see &lt;a href=&#34;/blog/2021/04/12/how-to-troubleshoot-remote-write-issues-in-prometheus/&#34;&gt;this blog post&lt;/a&gt;
by Prometheus maintainer Callum Styan.&lt;/p&gt;
&lt;h3 id=&#34;appender-not-ready&#34;&gt;Appender Not Ready&lt;/h3&gt;
&lt;p&gt;Each tenant&amp;rsquo;s WAL has an &amp;ldquo;appender&amp;rdquo; internally; this appender is used to &lt;em&gt;append&lt;/em&gt; samples to the WAL. The appender is marked
as &lt;em&gt;not ready&lt;/em&gt; until the WAL replay is complete upon startup. If the WAL is corrupted for some reason, or is taking a long
time to replay, you can determine this by alerting on &lt;code&gt;loki_ruler_wal_appender_ready &amp;lt; 1&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;corrupt-wal&#34;&gt;Corrupt WAL&lt;/h3&gt;
&lt;p&gt;If a disk fails or the &lt;code&gt;ruler&lt;/code&gt; does not terminate correctly, there&amp;rsquo;s a chance one or more tenant WALs can become corrupted.
A mechanism exists for automatically repairing the WAL, but this cannot handle every conceivable scenario. In this case,
the &lt;code&gt;loki_ruler_wal_corruptions_repair_failed_total&lt;/code&gt; metric will be incremented.&lt;/p&gt;
&lt;h3 id=&#34;found-another-failure-mode&#34;&gt;Found another failure mode?&lt;/h3&gt;
&lt;p&gt;Please open an &lt;a href=&#34;https://github.com/grafana/loki/issues&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;issue&lt;/a&gt; and tell us about it!&lt;/p&gt;
]]></content><description>&lt;h1 id="recording-rules">Recording Rules&lt;/h1>
&lt;p>Recording rules are evaluated by the &lt;code>ruler&lt;/code> component. Each &lt;code>ruler&lt;/code> acts as its own &lt;code>querier&lt;/code>, in the sense that it
executes queries against the store without using the &lt;code>query-frontend&lt;/code> or &lt;code>querier&lt;/code> components. It will respect all query
&lt;a href="/docs/loki/latest/configuration/#limits_config">limits&lt;/a> put in place for the &lt;code>querier&lt;/code>.&lt;/p></description></item></channel></rss>