Loki Improvement Documents (LIDs) on Grafana Labs

0001: Introducing LIDs

Thu, 09 Apr 2026 02:28:18 +0000

0001: Introducing LIDs

Author: Danny Kopping (danny.kopping@grafana.com)

Date: 01/2023

Sponsor(s): @dannykopping

Type: Process

Status: Accepted

Related issues/PRs: N/A

Thread from mailing list: N/A

Background

As the Grafana Loki project grows, we have seen more and more contributions from external (outside Grafana Labs) contributors.

Problem Statement

Many of these external contributions are large and complex, and have taken these contributors significant time to implement. Large contributions that are made without prior discussion with maintainers are at risk of being rejected if they are misguided, implemented inefficiently, or simply undesired; this is obviously suboptimal both for the contributors and the maintainers.

Aside from external contributions, changes being proposed by Grafana Loki maintainers may also require community engagement before being worked on.

Goals

It would be preferable to engage with contributors before they make large contributions to ensure that both their and the project’s interests are aligned. The community at large must also have a voice when feature or process changes are being proposed, to protect their own interests.

We should implement a lightweight process that guides the implementation of major changes to the project.

Proposals

Proposal 0: Do nothing

We will continue to attract large, often complex, external contributions that have not be discussed with maintainers prior to the work being put in; this may lead to suboptimal outcomes for the relationship between the project and its community.

Proposal 1: Loki Improvement Documents

Inspired by Python’s PEP and Kafka’s KIP approaches, we should create a process for formally documenting improvements to Loki which are permanently viewable, and document our decisions.

Other Notes

Google Docs were considered for this, but they are less useful because:

they would need to be owned by the Grafana Labs organisation, so that they remain viewable even if the author closes their account
we already have previous design documents in our documentation and, in a recent (5th Jan 2023) community call, the community expressed a preference for this type of approach

0002: Remote Rule Evaluation

Thu, 09 Apr 2026 02:28:18 +0000

0002: Remote Rule Evaluation

Author: Danny Kopping (danny.kopping@grafana.com)

Date: 01/2023

Sponsor(s): @dannykopping

Type: Feature

Status: Accepted

Related issues/PRs: https://github.com/grafana/mimir/pull/1536

Thread from mailing list: N/A

Background

The ruler is a component that evaluates alerting and recording rules. Loki reuses Prometheus’ rule evaluation engine. The ruler currently operates by initialising a querier internally and evaluating all rules “locally” (i.e. it does not rely on any other components). Each rule group executes concurrently, and rules within the rule group are evaluated sequentially (this is an implementation detail from Prometheus).

Recording rules produce metric series which are sent to a Prometheus-compatible source. Alerting rules send notifications to Alertmanager when a condition is met. Both of these rule types can play a vital role in an organisation’s observability strategy, and so their reliable evaluation is essential.

Problem Statement

Rule evaluations can contain expensive queries. The ruler initialises a querier, but the querier does not have the capability to accelerate queries; the query-frontend component is responsible for query acceleration through splitting, sharding, caching, and other techniques.

An expensive rule query can cause an entire ruler instance to use excessive resources and even crash. This is highly problematic for the following reasons:

slow rule evaluations can lead to subsequent rules in a group to be delayed or missed, leading to missing alerts or gaps in recording rule metrics
excessive resource usage can impede the evaluation of rules for other tenants (noisy neighbour)

Goals

faster, more efficient rule evaluation
greater isolation between tenants
more reliable service

Non-Goals

This proposal does not aim to make this option the default mode of evaluation; it should be optional because it increases operational complexity.

Proposals

Proposal 0: Do nothing

Loki’s current ruler implementation is sufficient for small installations running relatively simple or inexpensive queries.

Pros:

Nothing to be done

Cons:

Loki’s ruler will remain unreliable and inefficient when used in large multi-tenant environments with expensive queries.

Proposal 1: Remote Execution

Taking inspiration from Grafana Mimir’s implementation, the ruler would be configured to send its rule query to the query-frontend component over gRPC. The querier instances receiving queries from the query-frontend (or optionally via the query-scheduler) will handle the request and send the responses to the query-frontend and be combined. The ruler will receive and process these responses as if the query had been executed locally.

Pros:

Takes full advantage of Loki’s query acceleration techniques, leading to faster and more efficient rule evaluation
Operationally simple as existing query-frontend/query-scheduler/querier setup can be used
Per-tenant isolation available in Loki’s query path (shuffle-sharding, per-tenant queues) can be used to reduce or eliminate the noisy neighbour problem

Cons:

Increased interdependence in components, increased cross-component networking
Reusing the same query-frontend/query-scheduler/querier setup can cause expensive queries to starve rule evaluations of query resources, and vice versa
- Additional complexity introduced if this setup needs to be duplicated for rule evaluations (recommended: see Other Notes section below)

Other Notes

If this feature were to be used in conjunction with rule-based sharding, this can present some further optimisation but also some additional challenges to consider.

Aside: the ruler shards by rule group by default, which means that rules can be unevenly balanced across ruler instances if some rule groups have more expensive queries than others. Another consequence of this is that rule groups execute sequentially, so expensive queries can cause subsequent rules in the group to be delayed or even missed. Rule groups are evaluated concurrently.

Rule-based sharding distributes rules evenly across all available ruler instances, each in their own rule group. Consequentially, each rule that belongs to a ruler instance will be evaluated concurrently (as they’re each in their own rule group). For tenants with hundreds or thousands of rules, this can result in large batches of queries being sent to the query-frontend in quick succession, should they all use the same interval or happen to overlap.

Assuming the remote rule evaluation takes place on the same read path that is used to execute tenant queries, care must be taken by operators who run large multi-tenant setups to ensure that large volumes of queries can be received, queued, and processed in an acceptable timeframe. The query-scheduler component is highly recommended in these situations, as it will enable the query-frontend and query components to scale out to accommodate the load. Shuffle-sharding should also be implemented to ensure that tenants with particularly large workloads do not starve out the query resources of other tenants. Alerting should also be put in place to notify operators if rule evaluations are being routinely missed or a tenants’ query queues become full.

If rule evaluations and tenant queries are slowing each other down, the read path setup would need to be duplicated so that tenant queries and rule evaluations would not share the same query execution resources.

Rule-based sharding and remote evaluation can (and should) be implemented separately. Operators should first implement remote evaluation to improve ruler reliability, and then further investigate rule-based sharding if rule evaluations are still being missed due to the sequential execution of rule groups, or advise their tenants to split these rule groups up.

0003: Query fairness across users within tenants

Thu, 09 Apr 2026 02:28:18 +0000

0003: Query fairness across users within tenants

Author: Christian Haudum (christian.haudum@grafana.com)

Date: 02/2023

Sponsor(s): @chaudum @owen-d

Type: Feature

Status: Accepted

Related issues/PRs:

Thread from mailing list:

Background

The query scheduler (or short scheduler) is a component of Loki that distributes requests (sub-queries) from the query frontend (or short frontend) to the querier workers so that execution fairness between tenants can be guaranteed.

By maintaining separate FIFO queues for each tenant and assigning the correct amount of querier workers to these queues, the scheduler takes care that a single tenant cannot compromise all other tenants’ query capabilities.

Component diagram:

Sequence diagram:

Problem Statement

Even though Loki is built as multi-tenant system by default, there are use-cases where a Loki installation only has a very large, single tenant, e.g. dedicated Loki cells for customers in Grafana Cloud.

However, there are potentially a lot of different users using the same tenant to query logs, such as users accessing Loki from Grafana or via CLI or HTTP API. This can lead to contention between queries of different users, because they all share the same tenant.

While the current implementation of the scheduler queues allows for QoS guarantees between tenants, it does not account for QoS guarantees across individual users within a single tenant.

That said, Loki does not have the notation of individual users.

Goals

The main goal of the following proposals is to lay out ideas how to improve the scheduler component to not only assure QoS across tenants, but also across actors (users) within a tenant, without requiring any changes to the deployment model of frontend, scheduler and queriers. This should also include changes to the queue structure to be easily extensible for future scheduling improvements.

Non-Goals (optional)

While changing and extending the scheduler requires also user-facing API changes, the public API is not part of the discussion of this document.

Proposals

Proposal 0: Do nothing

An alternative to changing the scheduling mechanism is to handle QoS control via multiple tenants and multi-tenant querying.

Pros:

Keeps the scheduler as simple as it is now
No development time

Cons:

While that separation into tenants may work for some prospects, it might not be feasible to implement for others.

Proposal 1: Add fixed second level to scheduler

The current scheduler is implemented in a way that it maintains a separate FIFO queue for each tenant. When a request (sub-query) is enqueued, the scheduler puts it into the existing queue for that tenant. If the queue does not exist yet, it creates it first and re-assignes the connected querier workers to the available tenant queues. Each querier worker pulls round-robin from the assigned queues in a loop.

Now, instead of enqueuing and pulling directly from the per-tenant queue, requests get enqueued in per-user queues and the per-tenant queue pulls round-robin from the user queues that are assigned to the tenant queues.

Component diagram:

Like the current implementation, the scheduler enqueues requests based on the X-Scope-OrgID header (or equivalent key in the request context), but also takes a second key (such as X-Scope-UserID) into account. This ensembles a fixed hierarchy with two levels where the tenant-to-user relation is a one-to-many relation. However, this has the disadvantage that the concept of users (that does not exist yet in Loki) leaks into the scheduler domain.

Pros:

Relatively simple to to implement

Cons:

Not extensible
Leaks domain knowledge

Proposal 2: Fully hierarchical scheduler

This proposal is similar to Proposal 1, but with the difference that there are no fixed levels and levels can be nested arbitrarily.

Component diagram:

The implementation of the RequestQueue, which controls what querier workers are connected to which root queues (aka tenant queues), can be kept as is. However, the concept of tenants and users is dropped and replaced by by a concept of hierarchical actors, which can be represented as a slice of identifiers. Note, this does not drop the concept of tenants throughout Loki (represented in the X-Scope-OrgID header and/or request context).

Example of identifiers:

actorA := []string{"tenant_a", "user_1"}
actorB := []string{"tenant_b", "user_2"}
actorC := []string{"tenant_b", "user_3", "service_foo"}
actorD := []string{"tenant_b", "user_3", "service_bar"}

More generally:

actorN := []string{"L0 Queue", "L1 Queue", "L2 Queue", ... "Ln Queue"}

The L0 queue (root queue) needs to be able to handle worker connections and therefore needs additional functionality compared to its leaf queues.

The following code snippet is meant to show the simplified recursive structure of the queues.

type Request interface{}

type Queue interface {
    Deqeue(actor []string) Request
    Enqueue(r Request, actor []string) error
}

// RequestQueue implements Queue
type RequestQueue struct {
    queriers   map[string]*querier
    rootQueues map[string]*RootQueue
}

// RootQueue implements Queue
type RootQueue struct {
    queriers map[string]*querier
    leafs    map[string]*LeafQueue
    ch       chan Request
}

// LeafQueue implements Queue
type LeafQueue struct {
    leafs map[string]*LeafQueue
    ch    chan Request
}

Pros:

Backwards compatible, because tenant can be identified as []string{"tenantID"}
Queue hierarchy can be extended without changing the scheduler implementation
Implementation does not require knowledge outside of its domain

Cons:

More complex to implement than fixed amount of levels
Each queue comes with memory overhead

Proposal 3: Multiple per-tenant sub-queues

Another option to keep the concept of users out of Loki and still provide some query fairness guarantees would be to simply shard request across multiple sub-queues within a tenant’s queue. The shard size could be a per-tenant setting to account for different tenant sizes.

This is similar to Proposal 1, in the sense of adding another fixed level of sub-queues. However, with the difference, that in this case, a single query request is assigned a random identifier that is hashed. When the query is split, the sub-requests maintain the same hashed identifier. The modulor of the hash defines to which sub-queue of a tenant requests will be enqueued.

Pros:

User agnostic per-request QoS control

Cons:

Requests of individual users can still effect other users
Not extensible

Alternative:

Sharding on a per-request basis can still be achieved with Proposal 2, by adding the request hash as an additional level in the hierarchy.

actor := []string{"tenant", "user", "request_hash"}

Consensus

Proposal 2 is going to be implemented.

0004: Index Gateway Sharding

Thu, 09 Apr 2026 02:28:18 +0000

0004: Index Gateway Sharding

Author: Christian Haudum (christian.haudum@grafana.com)

Date: 02/2023

Sponsor(s): @chaudum @owen-d

Type: Feature

Status: Rejected / Not Implemented

Related issues/PRs:

Thread from mailing list:

Background

This document tries to come up with a proposal on how to do a better sharding of data on the index gateways so we are able to scale the service horizontally to fulfill the increased need for metadata queries of big tenants.

The index gateway service can be run in “simple mode”, where an index gateway instance is responsible for handling, storing and returning requests for all indices for all tenants, or in “ring mode”, where an instance is responsible for a subset of tenants instead of all tenants.

On top of that, in order to achieve redundancy as well as spreading load, the index gateway ring uses by default a replication factor of 3.

This means, before an index gateway client makes a request to the index gateway server, it first hashes the tenant ID and then requests a replication set for that hash from the index gateway ring. Due to the fixed replication factor (RF), the replication set contains three server addresses. On every request, a random server from that list is picked to then execute the request on.

Problem Statement

The current strategy of sharding by tenant ID and having a replication factor fails in the long run, because even when running lots of index gateways, only a maximum of n instances could be utilized by a single tenant, where n is the value of the configured RF.

Another problem is that the RF is fixed and the same for all tenants, independent of their actual size in terms of log volume or query rate.

Goals

The goal of this document is to find a better sharding mechanism for the index gateway, so that there are no boundaries for scaling the service horizontally.

The sharding needs to account for the “size” of a tenant.
A single tenant needs to be able to utilize more than three index gateways.

Proposals

Proposal 0: Do nothing

If we do not improve the sharding mechanism for the index gateways and leave it as it is, it will become more and more difficult to serve metadata queries for large tenant in a reasonable amount of time, proportionally to the demand for these queries.

Proposal 1: Dynamic replication factor

Instead of using a fixed replication factor of 3, the RF can be derived from the amount of active members in the index gateway ring. That means that the RF would be a percentage of the available gateway instances. For example, a ring with 12 instances and 30% replication utilization would result in a RF of 3 (floor(12*0.3)). Scaling up to 18 instances would result in a RF of 5.

This approach would solve the problem of horizontal scaling. However, it does not solve the problem of different tenant sizes. It also fails to ensure replication for availability when running a small number of instances, unless there is a fixed lower value for the RF. It also tends to over-shard data in large index gateway deployments.

Proposal 2: Fixed per-tenant replication factor

Adding a random shard ID (e.g. shard-0, shard-1, … shard-n) to the tenant ID allows to utilize a certain amount of n instances. The amount of shards can be implemented as a per-tenant override setting. This would allow to use different amount of instances for each tenant. However, this approach results in non-deterministic hash keys.

Proposal 3: Shard by index files

In order to answer requests, the index gateway needs to download index files from the object storage, and since Loki builds a daily index file per tenant, these index files can be sharded evenly across all available index gateway instances. Each instance is then assigned a unique set of index files which it can answer metadata queries for.

This means that the sharding key is the name of the file in object storage. While this name encodes both the tenant and the date, this is not strictly necessary. Such a sharding mechanism could shard any files from object storage across a set of instances of a ring.

If the time range for the requested metadata is within a single day then a single index gateway instance can answer the metadata request. However, if a metadata request spans multiple days, also multiple index gateway instances are involved. There are two ways to solve this:

A) Split and merge on client side

The client resolves the necessary index files and their respective gateway instances. It splits the request into multiple sub-requests, executes them and merges them into a single result.

Pros:

Only the minimum necessary amount of requests are performed.

Cons:

The client requires information about how to split and merge requests.

B) Split and merge on index gateway handler side

The client can execute a request on any index gateway. This handler instance then identifies the index files that are involved, splits the query, and resolves the appropriate instances. Once it received the sub-queries it resembles the full response result and sends it back to the client.

Pros:

Sharding is handled transparently to the client.
Clients can communicate with any instance of the index gateway ring.
Domain information about splitting and merging is kept within index gateway server implementation.

Due to it’s architectural advantages, option B is proposed.

Other Notes

Architectural diagram of proposal 3

0005: Loki mixin configuration improvements

Thu, 09 Apr 2026 02:28:18 +0000

0005: Loki mixin configuration improvements

Author: Alexandre Chouinard (Daazku@gmail.com)

Date: 03/2025

Sponsor(s): N/A

Type: Feature

Status: Draft

Related issues/PRs:

Thread from mailing list: N/A

Background

There is no easy way to set up dashboards and alerts for Loki on a pre-existing Prometheus stack that does not use the Prometheus Operator with a specific configuration.

The metrics selectors are hardcoded, making the dashboard unusable without manual modifications in many cases. It is assumed that job, cluster, namespace, container and/or a combination of other labels are present on metrics and have very specific values.

Problem Statement

This renders the dashboards and alerts unusable for setups that do not conform to the current assumptions about which label(s) should be present in the metrics.

A good example of that would be the “job” label used everywhere: job=~\"$namespace/bloom-planner\"

Usually the job label refer to the task name used to scrape the targets, as per Prometheus documentation, and in k8s, if you are not using prometheus-operator with ServiceMonitor, it’s pretty common to have something like this as a scraping config:

        - job_name: "kubernetes-pods" # Can actually be anything you want.
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            # Cluster label is "required" by kubernetes-mixin dashboards
            - target_label: cluster
              replacement: '${cluster_label}'
            ...

which would scrape all pods and yield something like:

up{job="kubernetes-pods", ...}

Right off the bat, that makes the dashboards unusable because it’s incompatible with what is hardcoded in the dashboards and alerts.

Goals

Ideally, selectors should default to the values required internally by Grafana but remain configurable so users can tailor them to their setup.

A good example of this is how kubernetes-monitoring/kubernetes-mixin did it: Every possible selector is configurable and thus allow for various setup to properly work.

The structure is already there to support this. It just has not been leveraged properly.

Non-Goals (optional)

It would be desirable to create some automated checks verifying that all metrics used in dashboard and alerts are using the proper selector(s) from the configuration. There are many issues in the repository about new dashboards or dashboard updates not using the proper labels on metrics.

Proposals

Proposal 0: Do nothing

This forces the community to either manually edit the dashboards/alerts or conform to a specific metric collection approach for Loki.

Proposal 1: Allow metrics label selectors to be configurable

This will require a good amount of refactoring.

It allows easier adoption of the “official” dashboards and alerts by the community.

Define once, reuse everywhere. (Currently, updating requires extensive search and replace.)

Other Notes

If this proposal is accepted, I am willing to do the necessary work to move it forward.

0006: Expose Split Logic in API

Thu, 09 Apr 2026 02:28:18 +0000

0006: Expose Split Logic in API

Author: Karsten Jeschkies (karsten.jeschkies@grafana.com)

Date: 03/2025

Sponsor(s): @trevorwhitney

Type: API

Status: Review

Related issues/PRs: N/A

Thread from mailing list: N/A

Background

Loki has an internal logic to split and shard log and metric queries by time into multiple queries. However, this logic is not accessible outside of the code base. This proposal intends to create an API for clients to split queries by exposing the internal split logic.

A split query is divided by time. The results of a split query can be concatenated in order to form the final result.

A sharded query is divided by label values. The results of a sharded cannot always be concatenated but require some extra logic to form the final result. Some queries, such as topk cannot be sharded at all.

Problem Statement

Loki clients such as the Grafana Loki datasource or the Trino Loki connector benefit from splitting LogQL queries into multiple sub-queries either to process smaller chunks or to distribute work on query results.

Splitting a query requires parsing the LogQL query first but there are no parsers for other languages except Go and JavaScript.

Goals

The intended goal is to enable any client to split a query into multiple sub-queries that can be either executed sequentially or in parallel. The joined result of the sub-queries must be the same as executing the same query.

Non-Goals

This proposal does not aim to provide pagination for query results.

Proposals

Proposal 0: Do nothing

Without an API each client will have to use a LogQL parser.

Pros

The split logic in Loki can be changed at will without breaking client behavior.
There is no maintanence overhead for an API.

Cons

Currently, the LogQL grammar is specific to Go. It is not easy to port it and the parser to other languages.
Any changes to the splitting logic must be implemented for each client/platform.

Proposal 1: Expose Splitting in an API

A new endpoint GET /loki/api/v1/split_query is introduced that takes a splits parameter and the same parameters as the /loki/api/v1/query_range endpoint. The new endoint returns sub-queries split by time.

The splits parameter optionally defines the number of desired splits. The API is allowed to return fewer splits than requested.

The limit parameter has extended semantics. Setting it to 0 for a log stream query indicates to query all logs.

The response body is JSON encoded:

{ 
  "resultType": "matrix" | "streams" | "vector",
  "subqueries": [
    {
      start: <timestamp nanoseconds>,
      end: <timestamp nanoseconds>,
      limit: <number>,
      query: <query string> 
    },
    {
      start: <timestamp nanoseconds>,
      end: <timestamp nanoseconds>,
      limit: <number>,
      query: <query string> 
    }
  ]
}

Pros

Clients can split queries independent on the implemation language and platform.
Split logic is controlled by Loki and not the client. This means it can be improved, for example, by introducing sharding labels.

Cons

A new API endpoint increases the compatiblity surface area and thus maintanence overhead for Loki maintainers.

Proposal 2: Support Apache Arrow Flight RPC

Loki could support Apache Arrow Flight RPC which is designed to exchange large data sets in shards between services.

Pros

Supporting an open standard comes with support for other non-Loki clients.

Cons

Loki would have to support Apache Arrow which make the implementation more complicated.
Arrow Flight RPC assumes the data is being queried on the first request. Which means all shards are available at the same time. However, the intent of this document is that shards can be queried independently.