<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Loki Improvement Documents (LIDs) on Grafana Labs</title><link>https://grafana.com/docs/loki/v2.9.x/community/lids/</link><description>Recent content in Loki Improvement Documents (LIDs) on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/loki/v2.9.x/community/lids/index.xml" rel="self" type="application/rss+xml"/><item><title>0001: Introducing LIDs</title><link>https://grafana.com/docs/loki/v2.9.x/community/lids/0001-introduction/</link><pubDate>Thu, 10 Apr 2025 12:15:54 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.9.x/community/lids/0001-introduction/</guid><content><![CDATA[&lt;h1 id=&#34;0001-introducing-lids&#34;&gt;0001: Introducing LIDs&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Danny Kopping (&lt;a href=&#34;mailto:danny.kopping@grafana.com&#34;&gt;danny.kopping@grafana.com&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; 01/2023&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sponsor(s):&lt;/strong&gt; @dannykopping&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Type:&lt;/strong&gt; Process&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; Accepted&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Related issues/PRs:&lt;/strong&gt; N/A&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thread from &lt;a href=&#34;https://groups.google.com/forum/#!forum/lokiproject&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;mailing list&lt;/a&gt;:&lt;/strong&gt; N/A&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;As the Grafana Loki project grows, we have seen more and more contributions from external (outside Grafana Labs) contributors.&lt;/p&gt;
&lt;h2 id=&#34;problem-statement&#34;&gt;Problem Statement&lt;/h2&gt;
&lt;p&gt;Many of these external contributions are large and complex, and have taken these contributors significant time to implement. Large contributions that are made without prior discussion with maintainers are at risk of being rejected if they are misguided, implemented inefficiently, or simply undesired; this is obviously suboptimal both for the contributors and the maintainers.&lt;/p&gt;
&lt;p&gt;Aside from external contributions, changes being proposed by Grafana Loki maintainers may also require community engagement before being worked on.&lt;/p&gt;
&lt;h2 id=&#34;goals&#34;&gt;Goals&lt;/h2&gt;
&lt;p&gt;It would be preferable to engage with contributors &lt;em&gt;before&lt;/em&gt; they make large contributions to ensure that both their and the project&amp;rsquo;s interests are aligned. The community at large must also have a voice when feature or process changes are being proposed, to protect their own interests.&lt;/p&gt;
&lt;p&gt;We should implement a &lt;strong&gt;lightweight&lt;/strong&gt; process that guides the implementation of major changes to the project.&lt;/p&gt;
&lt;h2 id=&#34;proposals&#34;&gt;Proposals&lt;/h2&gt;
&lt;h3 id=&#34;proposal-0-do-nothing&#34;&gt;Proposal 0: Do nothing&lt;/h3&gt;
&lt;p&gt;We will continue to attract large, often complex, external contributions that have not be discussed with maintainers prior to the work being put in; this may lead to suboptimal outcomes for the relationship between the project and its community.&lt;/p&gt;
&lt;h3 id=&#34;proposal-1-loki-improvement-documents&#34;&gt;Proposal 1: Loki Improvement Documents&lt;/h3&gt;
&lt;p&gt;Inspired by Python&amp;rsquo;s &lt;a href=&#34;https://peps.python.org/pep-0001/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;PEP&lt;/a&gt; and Kafka&amp;rsquo;s &lt;a href=&#34;https://cwiki.apache.org/confluence/display/KAFKA/Kafka&amp;#43;Improvement&amp;#43;Proposals&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;KIP&lt;/a&gt; approaches, we should create a process for formally documenting improvements to Loki which are permanently viewable, and document our decisions.&lt;/p&gt;
&lt;h2 id=&#34;other-notes&#34;&gt;Other Notes&lt;/h2&gt;
&lt;p&gt;Google Docs were considered for this, but they are less useful because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;they would need to be owned by the Grafana Labs organisation, so that they remain viewable even if the author closes their account&lt;/li&gt;
&lt;li&gt;we already have previous &lt;a href=&#34;../../design-documents/&#34;&gt;design documents&lt;/a&gt; in our documentation and, in a recent (&lt;a href=&#34;https://docs.google.com/document/d/1MNjiHQxwFukm2J4NJRWyRgRIiK7VpokYyATzJ5ce-O8/edit#heading=h.78vexgrrtw5a&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;5th Jan 2023&lt;/a&gt;) community call, the community expressed a preference for this type of approach&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="0001-introducing-lids">0001: Introducing LIDs&lt;/h1>
&lt;p>&lt;strong>Author:&lt;/strong> Danny Kopping (&lt;a href="mailto:danny.kopping@grafana.com">danny.kopping@grafana.com&lt;/a>)&lt;/p>
&lt;p>&lt;strong>Date:&lt;/strong> 01/2023&lt;/p>
&lt;p>&lt;strong>Sponsor(s):&lt;/strong> @dannykopping&lt;/p>
&lt;p>&lt;strong>Type:&lt;/strong> Process&lt;/p>
&lt;p>&lt;strong>Status:&lt;/strong> Accepted&lt;/p>
&lt;p>&lt;strong>Related issues/PRs:&lt;/strong> N/A&lt;/p>
&lt;p>&lt;strong>Thread from &lt;a href="https://groups.google.com/forum/#!forum/lokiproject" target="_blank" rel="noopener noreferrer">mailing list&lt;/a>:&lt;/strong> N/A&lt;/p>
&lt;hr />
&lt;h2 id="background">Background&lt;/h2>
&lt;p>As the Grafana Loki project grows, we have seen more and more contributions from external (outside Grafana Labs) contributors.&lt;/p></description></item><item><title>0002: Remote Rule Evaluation</title><link>https://grafana.com/docs/loki/v2.9.x/community/lids/0002-remoteruleevaluation/</link><pubDate>Wed, 06 Sep 2023 12:47:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.9.x/community/lids/0002-remoteruleevaluation/</guid><content><![CDATA[&lt;h1 id=&#34;0002-remote-rule-evaluation&#34;&gt;0002: Remote Rule Evaluation&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Danny Kopping (&lt;a href=&#34;mailto:danny.kopping@grafana.com&#34;&gt;danny.kopping@grafana.com&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; 01/2023&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sponsor(s):&lt;/strong&gt; @dannykopping&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Type:&lt;/strong&gt; Feature&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; Accepted&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Related issues/PRs:&lt;/strong&gt; &lt;a href=&#34;https://github.com/grafana/mimir/pull/1536&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;https://github.com/grafana/mimir/pull/1536&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thread from &lt;a href=&#34;https://groups.google.com/forum/#!forum/lokiproject&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;mailing list&lt;/a&gt;:&lt;/strong&gt; N/A&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;ruler&lt;/code&gt; is a component that evaluates alerting and recording rules. Loki reuses Prometheus&amp;rsquo; rule evaluation engine. The &lt;code&gt;ruler&lt;/code&gt; currently operates by initialising a &lt;code&gt;querier&lt;/code&gt; internally and evaluating all rules &amp;ldquo;locally&amp;rdquo; (i.e. it does not rely on any other components). Each rule group executes concurrently, and rules within the rule group are evaluated sequentially (this is an implementation detail from Prometheus).&lt;/p&gt;
&lt;p&gt;Recording rules produce metric series which are sent to a Prometheus-compatible source. Alerting rules send notifications to Alertmanager when a condition is met. Both of these rule types can play a vital role in an organisation&amp;rsquo;s observability strategy, and so their reliable evaluation is essential.&lt;/p&gt;
&lt;h2 id=&#34;problem-statement&#34;&gt;Problem Statement&lt;/h2&gt;
&lt;p&gt;Rule evaluations can contain expensive queries. The &lt;code&gt;ruler&lt;/code&gt; initialises a &lt;code&gt;querier&lt;/code&gt;, but the &lt;code&gt;querier&lt;/code&gt; does not have the capability to accelerate queries; the &lt;code&gt;query-frontend&lt;/code&gt; component is responsible for query acceleration through splitting, sharding, caching, and other techniques.&lt;/p&gt;
&lt;p&gt;An expensive rule query can cause an entire &lt;code&gt;ruler&lt;/code&gt; instance to use excessive resources and even crash. This is highly problematic for the following reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;slow rule evaluations can lead to subsequent rules in a group to be delayed or missed, leading to missing alerts or gaps in recording rule metrics&lt;/li&gt;
&lt;li&gt;excessive resource usage can impede the evaluation of rules for other tenants (noisy neighbour)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;goals&#34;&gt;Goals&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;faster, more efficient rule evaluation&lt;/li&gt;
&lt;li&gt;greater isolation between tenants&lt;/li&gt;
&lt;li&gt;more reliable service&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;non-goals&#34;&gt;Non-Goals&lt;/h2&gt;
&lt;p&gt;This proposal does not aim to make this option the default mode of evaluation; it should be optional because it increases operational complexity.&lt;/p&gt;
&lt;h2 id=&#34;proposals&#34;&gt;Proposals&lt;/h2&gt;
&lt;h3 id=&#34;proposal-0-do-nothing&#34;&gt;Proposal 0: Do nothing&lt;/h3&gt;
&lt;p&gt;Loki&amp;rsquo;s current &lt;code&gt;ruler&lt;/code&gt; implementation is sufficient for small installations running relatively simple or inexpensive queries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Nothing to be done&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Loki&amp;rsquo;s &lt;code&gt;ruler&lt;/code&gt; will remain unreliable and inefficient when used in large multi-tenant environments with expensive queries.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;proposal-1-remote-execution&#34;&gt;Proposal 1: Remote Execution&lt;/h3&gt;
&lt;p&gt;Taking inspiration from &lt;a href=&#34;/docs/mimir/latest/operators-guide/architecture/components/ruler/#remote&#34;&gt;Grafana Mimir&amp;rsquo;s implementation&lt;/a&gt;, the &lt;code&gt;ruler&lt;/code&gt; would be configured to send its rule query to the &lt;code&gt;query-frontend&lt;/code&gt; component over gRPC. The &lt;code&gt;querier&lt;/code&gt; instances receiving queries from the &lt;code&gt;query-frontend&lt;/code&gt; (or optionally via the &lt;code&gt;query-scheduler&lt;/code&gt;) will handle the request and send the responses to the &lt;code&gt;query-frontend&lt;/code&gt; and be combined. The &lt;code&gt;ruler&lt;/code&gt; will receive and process these responses as if the query had been executed locally.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Takes full advantage of Loki&amp;rsquo;s query acceleration techniques, leading to faster and more efficient rule evaluation&lt;/li&gt;
&lt;li&gt;Operationally simple as existing &lt;code&gt;query-frontend&lt;/code&gt;/&lt;code&gt;query-scheduler&lt;/code&gt;/&lt;code&gt;querier&lt;/code&gt; setup can be used&lt;/li&gt;
&lt;li&gt;Per-tenant isolation available in Loki&amp;rsquo;s query path (shuffle-sharding, per-tenant queues) can be used to reduce or eliminate the noisy neighbour problem&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Increased interdependence in components, increased cross-component networking&lt;/li&gt;
&lt;li&gt;Reusing the same &lt;code&gt;query-frontend&lt;/code&gt;/&lt;code&gt;query-scheduler&lt;/code&gt;/&lt;code&gt;querier&lt;/code&gt; setup can cause expensive queries to starve rule evaluations of query resources, and vice versa
&lt;ul&gt;
&lt;li&gt;Additional complexity introduced if this setup needs to be duplicated for rule evaluations (recommended: see &lt;strong&gt;Other Notes&lt;/strong&gt; section below)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;other-notes&#34;&gt;Other Notes&lt;/h2&gt;
&lt;p&gt;If this feature were to be used in conjunction with &lt;a href=&#34;https://github.com/grafana/loki/pull/8092&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;rule-based sharding&lt;/a&gt;, this can present some further optimisation but also some additional challenges to consider.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Aside: the &lt;code&gt;ruler&lt;/code&gt; shards by rule group by default, which means that rules can be unevenly balanced across &lt;code&gt;ruler&lt;/code&gt; instances if some rule groups have more expensive queries than others. Another consequence of this is that rule groups execute sequentially, so expensive queries can cause subsequent rules in the group to be delayed or even missed. Rule groups are evaluated concurrently.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Rule-based sharding distributes rules evenly across all available &lt;code&gt;ruler&lt;/code&gt; instances, each in their own rule group. Consequentially, each rule that belongs to a &lt;code&gt;ruler&lt;/code&gt; instance will be evaluated concurrently (as they&amp;rsquo;re each in their own rule group). For tenants with hundreds or thousands of rules, this can result in large batches of queries being sent to the &lt;code&gt;query-frontend&lt;/code&gt; in quick succession, should they all use the same interval or happen to overlap.&lt;/p&gt;
&lt;p&gt;Assuming the remote rule evaluation takes place on the same read path that is used to execute tenant queries, care must be taken by operators who run large multi-tenant setups to ensure that large volumes of queries can be received, queued, and processed in an acceptable timeframe. The &lt;code&gt;query-scheduler&lt;/code&gt; component is highly recommended in these situations, as it will enable the &lt;code&gt;query-frontend&lt;/code&gt; and &lt;code&gt;query&lt;/code&gt; components to scale out to accommodate the load. Shuffle-sharding should also be implemented to ensure that tenants with particularly large workloads do not starve out the query resources of other tenants. Alerting should also be put in place to notify operators if rule evaluations are being routinely missed or a tenants&amp;rsquo; query queues become full.&lt;/p&gt;
&lt;p&gt;If rule evaluations and tenant queries are slowing each other down, the read path setup would need to be duplicated so that tenant queries and rule evaluations would not share the same query execution resources.&lt;/p&gt;
&lt;p&gt;Rule-based sharding and remote evaluation can (and should) be implemented separately. Operators should first implement remote evaluation to improve &lt;code&gt;ruler&lt;/code&gt; reliability, and &lt;em&gt;then&lt;/em&gt; further investigate rule-based sharding if rule evaluations are still being missed due to the sequential execution of rule groups, or advise their tenants to split these rule groups up.&lt;/p&gt;
]]></content><description>&lt;h1 id="0002-remote-rule-evaluation">0002: Remote Rule Evaluation&lt;/h1>
&lt;p>&lt;strong>Author:&lt;/strong> Danny Kopping (&lt;a href="mailto:danny.kopping@grafana.com">danny.kopping@grafana.com&lt;/a>)&lt;/p>
&lt;p>&lt;strong>Date:&lt;/strong> 01/2023&lt;/p>
&lt;p>&lt;strong>Sponsor(s):&lt;/strong> @dannykopping&lt;/p>
&lt;p>&lt;strong>Type:&lt;/strong> Feature&lt;/p>
&lt;p>&lt;strong>Status:&lt;/strong> Accepted&lt;/p>
&lt;p>&lt;strong>Related issues/PRs:&lt;/strong> &lt;a href="https://github.com/grafana/mimir/pull/1536" target="_blank" rel="noopener noreferrer">https://github.com/grafana/mimir/pull/1536&lt;/a>&lt;/p>
&lt;p>&lt;strong>Thread from &lt;a href="https://groups.google.com/forum/#!forum/lokiproject" target="_blank" rel="noopener noreferrer">mailing list&lt;/a>:&lt;/strong> N/A&lt;/p>
&lt;hr />
&lt;h2 id="background">Background&lt;/h2>
&lt;p>The &lt;code>ruler&lt;/code> is a component that evaluates alerting and recording rules. Loki reuses Prometheus&amp;rsquo; rule evaluation engine. The &lt;code>ruler&lt;/code> currently operates by initialising a &lt;code>querier&lt;/code> internally and evaluating all rules &amp;ldquo;locally&amp;rdquo; (i.e. it does not rely on any other components). Each rule group executes concurrently, and rules within the rule group are evaluated sequentially (this is an implementation detail from Prometheus).&lt;/p></description></item><item><title>0003: Query fairness across users within tenants</title><link>https://grafana.com/docs/loki/v2.9.x/community/lids/0003-queryfairnessinscheduler/</link><pubDate>Wed, 06 Sep 2023 12:47:18 +0000</pubDate><guid>https://grafana.com/docs/loki/v2.9.x/community/lids/0003-queryfairnessinscheduler/</guid><content><![CDATA[&lt;h1 id=&#34;0003-query-fairness-across-users-within-tenants&#34;&gt;0003: Query fairness across users within tenants&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Author:&lt;/strong&gt; Christian Haudum (&lt;a href=&#34;mailto:christian.haudum@grafana.com&#34;&gt;christian.haudum@grafana.com&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Date:&lt;/strong&gt; 02/2023&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sponsor(s):&lt;/strong&gt; @chaudum @owen-d&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Type:&lt;/strong&gt; Feature&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; Accepted&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Related issues/PRs:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thread from &lt;a href=&#34;https://groups.google.com/forum/#!forum/lokiproject&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;mailing list&lt;/a&gt;:&lt;/strong&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&#34;background&#34;&gt;Background&lt;/h2&gt;
&lt;p&gt;The query scheduler (or short scheduler) is a component of Loki that distributes requests (sub-queries) from the query frontend (or short frontend) to the querier workers so that execution fairness between tenants can be guaranteed.&lt;/p&gt;
&lt;p&gt;By maintaining separate FIFO queues for each tenant and assigning the correct amount of querier workers to these queues, the scheduler takes care that a single tenant cannot compromise all other tenants&amp;rsquo; query capabilities.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component diagram:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;../scheduler-component-diagram.png&#34;
  alt=&#34;scheduler-component-diagram.plantuml&#34;/&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sequence diagram:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;../scheduler-sequence-diagram.png&#34;
  alt=&#34;scheduler-sequence-diagram.plantuml&#34;/&gt;&lt;/p&gt;
&lt;h2 id=&#34;problem-statement&#34;&gt;Problem Statement&lt;/h2&gt;
&lt;p&gt;Even though Loki is built as multi-tenant system by default, there are use-cases where a Loki installation only has a very large, single tenant, e.g. dedicated Loki cells for customers in Grafana Cloud.&lt;/p&gt;
&lt;p&gt;However, there are potentially a lot of different users using the same tenant to query logs, such as users accessing Loki from Grafana or via CLI or HTTP API. This can lead to contention between queries of different users, because they all share the same tenant.&lt;/p&gt;
&lt;p&gt;While the current implementation of the scheduler queues allows for QoS guarantees between tenants, it does not account for QoS guarantees across individual users within a single tenant.&lt;/p&gt;
&lt;p&gt;That said, Loki does not have the notation of individual users.&lt;/p&gt;
&lt;h2 id=&#34;goals&#34;&gt;Goals&lt;/h2&gt;
&lt;p&gt;The main goal of the following proposals is to lay out ideas how to improve the scheduler component to not only assure QoS across tenants, but also across actors (users) within a tenant, without requiring any changes to the deployment model of frontend, scheduler and queriers.
This should also include changes to the queue structure to be easily extensible for future scheduling improvements.&lt;/p&gt;
&lt;h2 id=&#34;non-goals-optional&#34;&gt;Non-Goals (optional)&lt;/h2&gt;
&lt;p&gt;While changing and extending the scheduler requires also user-facing API changes, the public API is not part of the discussion of this document.&lt;/p&gt;
&lt;h2 id=&#34;proposals&#34;&gt;Proposals&lt;/h2&gt;
&lt;h3 id=&#34;proposal-0-do-nothing&#34;&gt;Proposal 0: Do nothing&lt;/h3&gt;
&lt;p&gt;An alternative to changing the scheduling mechanism is to handle QoS control via multiple tenants and multi-tenant querying.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Keeps the scheduler as simple as it is now&lt;/li&gt;
&lt;li&gt;No development time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;While that separation into tenants may work for some prospects, it might not be feasible to implement for others.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;proposal-1-add-fixed-second-level-to-scheduler&#34;&gt;Proposal 1: Add fixed second level to scheduler&lt;/h3&gt;
&lt;p&gt;The current scheduler is implemented in a way that it maintains a separate FIFO queue for each tenant. When a request (sub-query) is enqueued, the scheduler puts it into the existing queue for that tenant. If the queue does not exist yet, it creates it first and re-assignes the connected querier workers to the available tenant queues. Each querier worker pulls round-robin from the assigned queues in a loop.&lt;/p&gt;
&lt;p&gt;Now, instead of enqueuing and pulling directly from the per-tenant queue, requests get enqueued in per-user queues and the per-tenant queue pulls round-robin from the user queues that are assigned to the tenant queues.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component diagram:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;../scheduler-proposal-1-component-diagram.png&#34;
  alt=&#34;scheduler-proposal-1-component-diagram.plantuml&#34;/&gt;&lt;/p&gt;
&lt;p&gt;Like the current implementation, the scheduler enqueues requests based on the &lt;code&gt;X-Scope-OrgID&lt;/code&gt; header (or equivalent key in the request context), but also takes a second key (such as &lt;code&gt;X-Scope-UserID&lt;/code&gt;) into account. This ensembles a fixed hierarchy with two levels where the tenant-to-user relation is a one-to-many relation.
However, this has the disadvantage that the concept of users (that does not exist yet in Loki) leaks into the scheduler domain.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Relatively simple to to implement&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not extensible&lt;/li&gt;
&lt;li&gt;Leaks domain knowledge&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;proposal-2-fully-hierarchical-scheduler&#34;&gt;Proposal 2: Fully hierarchical scheduler&lt;/h3&gt;
&lt;p&gt;This proposal is similar to &lt;em&gt;Proposal 1&lt;/em&gt;, but with the difference that there are no fixed levels and levels can be nested arbitrarily.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Component diagram:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img
  class=&#34;lazyload d-inline-block&#34;
  data-src=&#34;../scheduler-proposal-2-component-diagram.png&#34;
  alt=&#34;scheduler-proposal-2-component-diagram.plantuml&#34;/&gt;&lt;/p&gt;
&lt;p&gt;The implementation of the &lt;code&gt;RequestQueue&lt;/code&gt;, which controls what querier workers are connected to which root queues (aka tenant queues), can be kept as is. However, the concept of tenants and users is dropped and replaced by by a concept of hierarchical actors, which can be represented as a slice of identifiers. Note, this does &lt;strong&gt;not&lt;/strong&gt; drop the concept of tenants throughout Loki (represented in the &lt;code&gt;X-Scope-OrgID&lt;/code&gt; header and/or request context).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example of identifiers:&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Go&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-go&#34;&gt;actorA := []string{&amp;#34;tenant_a&amp;#34;, &amp;#34;user_1&amp;#34;}
actorB := []string{&amp;#34;tenant_b&amp;#34;, &amp;#34;user_2&amp;#34;}
actorC := []string{&amp;#34;tenant_b&amp;#34;, &amp;#34;user_3&amp;#34;, &amp;#34;service_foo&amp;#34;}
actorD := []string{&amp;#34;tenant_b&amp;#34;, &amp;#34;user_3&amp;#34;, &amp;#34;service_bar&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;More generally:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Go&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-go&#34;&gt;actorN := []string{&amp;#34;L0 Queue&amp;#34;, &amp;#34;L1 Queue&amp;#34;, &amp;#34;L2 Queue&amp;#34;, ... &amp;#34;Ln Queue&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The L0 queue (root queue) needs to be able to handle worker connections and therefore needs additional functionality compared to its leaf queues.&lt;/p&gt;
&lt;p&gt;The following code snippet is meant to show the simplified recursive structure of the queues.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Go&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-go&#34;&gt;type Request interface{}

type Queue interface {
    Deqeue(actor []string) Request
    Enqueue(r Request, actor []string) error
}

// RequestQueue implements Queue
type RequestQueue struct {
    queriers   map[string]*querier
    rootQueues map[string]*RootQueue
}

// RootQueue implements Queue
type RootQueue struct {
    queriers map[string]*querier
    leafs    map[string]*LeafQueue
    ch       chan Request
}

// LeafQueue implements Queue
type LeafQueue struct {
    leafs map[string]*LeafQueue
    ch    chan Request
}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Backwards compatible, because tenant can be identified as &lt;code&gt;[]string{&amp;quot;tenantID&amp;quot;}&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Queue hierarchy can be extended without changing the scheduler implementation&lt;/li&gt;
&lt;li&gt;Implementation does not require knowledge outside of its domain&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More complex to implement than fixed amount of levels&lt;/li&gt;
&lt;li&gt;Each queue comes with memory overhead&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;proposal-3-multiple-per-tenant-sub-queues&#34;&gt;Proposal 3: Multiple per-tenant sub-queues&lt;/h3&gt;
&lt;p&gt;Another option to keep the concept of users out of Loki and still provide some query fairness guarantees would be to simply shard request across multiple sub-queues within a tenant&amp;rsquo;s queue. The shard size could be a per-tenant setting to account for different tenant sizes.&lt;/p&gt;
&lt;p&gt;This is similar to Proposal 1, in the sense of adding another fixed level of sub-queues.
However, with the difference, that in this case, a single query request is assigned a random identifier that is hashed. When the query is split, the sub-requests maintain the same hashed identifier. The modulor of the hash defines to which sub-queue of a tenant requests will be enqueued.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User agnostic per-request QoS control&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requests of individual users can still effect other users&lt;/li&gt;
&lt;li&gt;Not extensible&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Alternative:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Sharding on a per-request basis can still be achieved with Proposal 2, by adding the request hash as an additional level in the hierarchy.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Go&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-go&#34;&gt;actor := []string{&amp;#34;tenant&amp;#34;, &amp;#34;user&amp;#34;, &amp;#34;request_hash&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;consensus&#34;&gt;Consensus&lt;/h2&gt;
&lt;p&gt;Proposal 2 is going to be implemented.&lt;/p&gt;
]]></content><description>&lt;h1 id="0003-query-fairness-across-users-within-tenants">0003: Query fairness across users within tenants&lt;/h1>
&lt;p>&lt;strong>Author:&lt;/strong> Christian Haudum (&lt;a href="mailto:christian.haudum@grafana.com">christian.haudum@grafana.com&lt;/a>)&lt;/p>
&lt;p>&lt;strong>Date:&lt;/strong> 02/2023&lt;/p>
&lt;p>&lt;strong>Sponsor(s):&lt;/strong> @chaudum @owen-d&lt;/p>
&lt;p>&lt;strong>Type:&lt;/strong> Feature&lt;/p>
&lt;p>&lt;strong>Status:&lt;/strong> Accepted&lt;/p>
&lt;p>&lt;strong>Related issues/PRs:&lt;/strong>&lt;/p>
&lt;p>&lt;strong>Thread from &lt;a href="https://groups.google.com/forum/#!forum/lokiproject" target="_blank" rel="noopener noreferrer">mailing list&lt;/a>:&lt;/strong>&lt;/p></description></item></channel></rss>