Pyroscope components on Grafana Labs

Grafana Pyroscope compactor

Wed, 08 Apr 2026 14:38:28 +0000

Grafana Pyroscope compactor

The compactor increases query performance and reduces long-term storage usage by combining blocks.

The compactor is the component responsible for:

Compacting multiple blocks of a given tenant into a single, optimized larger block. This deduplicates chunks and reduces the size of the index, resulting in reduced storage costs. Querying fewer blocks is faster, so it also increases query speed.
Keeping the per-tenant bucket index updated. The bucket index is used by queriers and store-gateways to discover both new blocks and deleted blocks in the storage.

The compactor is stateless.

How compaction works

Compaction occurs on a per-tenant basis.

The compactor runs at regular, configurable intervals.

Vertical compaction merges all the blocks of a tenant uploaded by ingesters for the same time range (1 hour range by default) into a single block. It also deduplicates samples that were originally written to N blocks as a result of replication. Vertical compaction reduces the number of blocks for a single time range from the quantity of ingesters down to one block per tenant.

Horizontal compaction triggers after a vertical compaction. It compacts several blocks with adjacent range periods into a single larger block. The total size of the associated block chunks does not change after horizontal compaction. The horizontal compaction may significantly reduce the size of the index and the index-header kept in memory by store-gateways.

Scaling

Compaction can be tuned for clusters with large tenants. Configuration specifies both vertical and horizontal scaling of how the compactor runs as it compacts on a per-tenant basis.

Vertical scaling
The setting -compactor.compaction-concurrency configures the max number of concurrent compactions running in a single compactor instance. Each compaction uses one CPU core.
Horizontal scaling
By default, tenant blocks can be compacted by any Grafana Pyroscope compactor. When you enable compactor shuffle sharding by setting -compactor.compactor-tenant-shard-size (or its respective YAML configuration option) to a value higher than 0 and lower than the number of available compactors, only the specified number of compactors are eligible to compact blocks for a given tenant.

Compaction algorithm

Pyroscope uses a sophisticated compaction algorithm called split-and-merge.

By design, the split-and-merge algorithm overcomes time series database (TSDB) index limitations, and it avoids situations in which compacted blocks grow indefinitely for a very large tenant at any compaction stage.

This compaction strategy is a two-stage process: split and merge. The default configuration disables the split stage.

To split, the first level of compaction, for example 2h, the compactor divides all source blocks into N (-compactor.split-groups) groups. For each group, the compactor compacts the blocks, but instead of producing a single result block, it outputs M (-compactor.split-and-merge-shards) blocks, known as split blocks. Each split block contains only a subset of the series belonging to a given shard out of M shards. At the end of the split stage, the compactor produces N * M blocks with a reference to their respective shard in the block’s meta.json file.

The compactor merges the split blocks for each shard. This compacts all N split blocks of a given shard. The merge reduces the number of blocks from N * M to M. For a given compaction time range, there will be a compacted block for each of the M shards.

The merge then runs on other configured compaction time ranges, for example 1h and 4h. It compacts blocks belonging to the same shard.

This strategy is suitable for clusters with large tenants. The number of shards M is configurable on a per-tenant basis using -compactor.split-and-merge-shards, and it can be adjusted based on the number of series of each tenant. The more a tenant grows in terms of series, the more you can grow the configured number of shards. Doing so improves compaction parallelization and keeps each per-shard compacted block size under control.

The number of split groups, N, can also be adjusted per tenant using the -compactor.split-groups option. Increasing this value produces more compaction jobs with fewer blocks during the split stage. This allows multiple compactors to work on these jobs, and finish the splitting stage faster. However, increasing this value also generates more intermediate blocks during the split stage, which will only be reduced later in the merge stage.

If the configuration of -compactor.split-and-merge-shards changes during compaction, the change will affect only the compaction of blocks which have not yet been split. Already split blocks will use the original configuration when merged. The original configuration is stored in the meta.json of each split block.

Splitting and merging can be horizontally scaled. Non-conflicting and non-overlapping jobs will be executed in parallel.

Compactor sharding

The compactor shards compaction jobs, either from a single tenant or multiple tenants. The compaction of a single tenant can be split and processed by multiple compactor instances.

Whenever the pool of compactors grows or shrinks, tenants and jobs are resharded across the available compactor instances without any manual intervention.

Compactor sharding uses a hash ring. At startup, a compactor generates random tokens and registers itself to the compactor hash ring. While running, it periodically scans the storage bucket at every interval defined by -compactor.compaction-interval, to discover the list of tenants in storage and to compact blocks for each tenant whose hash matches the token ranges assigned to the instance itself within the hash ring.

To configure the compactors’ hash ring, refer to configuring memberlist.

Waiting for a stable hash ring at startup

A cluster cold start or an increase of two or more compactor instances at the same time may result in each new compactor instance starting at a slightly different time. Then, each compactor runs its first compaction based on a different state of the hash ring. This is not an error condition, but it may be inefficient, because multiple compactor instances may start compacting the same tenant at nearly the same time.

To mitigate the issue, compactors can be configured to wait for a stable hash ring at startup. A ring is considered stable if no instance is added to or removed from the hash ring for at least -compactor.ring.wait-stability-min-duration. The maximum time the compactor will wait is controlled by the flag -compactor.ring.wait-stability-max-duration (or the respective YAML configuration option). Once the compactor has finished waiting, either because the ring stabilized or because the maximum wait time was reached, it will start up normally.

The default value of zero for -compactor.ring.wait-stability-min-duration disables waiting for ring stability.

Compaction jobs order

The compactor allows configuring of the compaction jobs order via the -compactor.compaction-jobs-order flag (or its respective YAML config option). The configured ordering defines which compaction jobs should be executed first. The following values of -compactor.compaction-jobs-order are supported:

smallest-range-oldest-blocks-first (default)

This ordering gives priority to smallest range, oldest blocks first.

For example, with compaction ranges 1h, 4h, 8h, the compactor will compact the 1h ranges first, and among them give priority to the oldest blocks. Once all blocks in the 1h range have been compacted, it moves to the 2h range, and finally to 8h one.

All split jobs are moved to the front of the work queue, because finishing all split jobs in a given time range unblocks the merge jobs.
newest-blocks-first

This ordering gives priority to the most recent time ranges first, regardless of their compaction level.

For example, with compaction ranges 1h, 4h, 8h, the compactor compacts the most recent blocks first (up to the 8h range), and then moves to older blocks. This policy favours the most recent blocks, assuming they are queried the most frequently.

Blocks deletion

Following a successful compaction, the original blocks are deleted from the storage. Block deletion is not immediate; it follows a two-step process:

An original block is marked for deletion; this is a soft delete
Once a block has been marked for deletion for longer than the configurable -compactor.deletion-delay, the block is deleted from storage; this is a hard delete

The compactor is responsible for both marking blocks and for hard deletion. Soft deletion is based on a small deletion-mark.json file stored within the block location in the bucket.

The soft delete mechanism gives queriers and store-gateways time to discover the new compacted blocks before the original blocks are deleted. If those original blocks were immediately hard deleted, some queries involving the compacted blocks could temporarily fail or return partial results.

Compactor disk utilization

The compactor needs to download blocks from the bucket to the local disk, and the compactor needs to store compacted blocks to the local disk before uploading them to the bucket. The largest tenants may need a lot of disk space.

Assuming max_compaction_range_blocks_size is the total block size for the largest tenant during the longest -compactor.block-ranges period, the expression that estimates the minimum disk space required is:

compactor.compaction-concurrency * max_compaction_range_blocks_size * 2

Compactor configuration

Refer to the compactor block section and the limits block section for details of compaction-related configuration.

Pyroscope distributor

Wed, 08 Apr 2026 14:38:28 +0000

Pyroscope distributor

The distributor is a stateless component that receives profiling data from the agent. The distributor then divides the data into batches and sends it to multiple ingesters in parallel, shards the series among ingesters, and replicates each series by the configured replication factor. By default, the configured replication factor is three.

Validation

The distributor cleans and validates data that it receives before writing the data to the ingesters. Because a single request can contain valid and invalid profiles, samples, metadata, and exemplars, the distributor only passes valid data to the ingesters. The distributor does not include invalid data in its requests to the ingesters. If the request contains invalid data, the distributor returns a 400 HTTP status code and the details appear in the response body. The details about the first invalid data are typically logged by the agent.

The distributor data cleanup includes the following transformation:

Ensure the profile has a timestamp set, if not it will default to the time the distributor received the profile.
The distributor will remove samples that are having values of 0 and will sum samples that share the same stacktrace.

Replication

The distributor shards and replicates incoming series across ingesters. You can configure the number of ingester replicas that each series is written to via the -distributor.replication-factor flag, which is 1 by default. Distributors use consistent hashing, in conjunction with a configurable replication factor, to determine which ingesters receive a given series.

Sharding and replication uses the ingesters’ hash ring. For each incoming series, the distributor computes a hash using the profile name, labels, and tenant ID. The computed hash is called a token. The distributor looks up the token in the hash ring to determine which ingesters to write a series to.

For more information, see hash ring.

Quorum consistency

Because distributors share access to the same hash ring, write requests can be sent to any distributor. You can also set up a stateless load balancer in front of it.

To ensure consistent query results, Pyroscope uses Dynamo-style quorum consistency on reads and writes. The distributor waits for a successful response from n/2 + 1 ingesters, where n is the configured replication factor, before sending a successful response to the Agent push request.

Load balancing across distributors

We recommend randomly load balancing write requests across distributor instances. If you’re running Pyroscope in a Kubernetes cluster, you can define a Kubernetes Service as ingress for the distributors.

Note: A Kubernetes Service balances TCP connections across Kubernetes endpoints and does not balance HTTP requests within a single TCP connection. If you enable HTTP persistent connections (HTTP keep-alive), because the Agent uses HTTP keep-alive, it re-uses the same TCP connection for each push HTTP request. This can cause distributors to receive an uneven distribution of push HTTP requests.

Pyroscope ingester

Wed, 08 Apr 2026 14:38:28 +0000

Pyroscope ingester

The ingester is a stateful component that writes incoming profiles first to on disk storage on the write path and returns series samples for queries on the read path.

Incoming profiles from distributors are not immediately written to the long-term storage but are either kept in the ingester’s memory or offloaded to the ingester’s disk. Eventually, all profiles are written to disk and periodically uploaded to the long-term storage. For this reason, the queriers might need to fetch samples from both ingesters and long-term storage while executing a query on the read path.

Any Pyroscope component that calls the ingesters starts by first looking up ingesters registered in the hash ring to determine which ingesters are available. Each ingester could be in one of the following states:

PENDING
The ingester has just started. While in this state, the ingester does not receive write or read requests.
JOINING
The ingester starts up and joins the ring. While in this state, the ingester does not receive write or read requests. The ingester loads tokens from disk (if -ingester.ring.tokens-file-path is configured) or generates a set of new random tokens. Finally, the ingester optionally observes the ring for token conflicts, and once resolved, moves to the ACTIVE state.
ACTIVE
The ingester is up and running. While in this state, the ingester can receive both write and read requests.
LEAVING
The ingester is shutting down and leaving the ring. While in this state, the ingester doesn’t receive write requests, but can still receive read requests.
UNHEALTHY
The ingester has failed to heartbeat to the hash ring. While in this state, distributors bypass the ingester, which means that the ingester does not receive write or read requests.

To configure the ingesters’ hash ring, refer to configuring memberlist.

Ingesters write de-amplification

Ingesters store recently received samples in-memory in order to perform write de-amplification. If the ingesters immediately write received samples to the long-term storage, the system would have difficulty scaling due to the high pressure on the long-term storage. For this reason, the ingesters batch and compress samples in-memory and periodically upload them to the long-term storage.

Write de-amplification is the main source of Pyroscope’s low total cost of ownership (TCO).

Ingesters failure and data loss

If an ingester process crashes or exits abruptly, all the in-memory profiles that have not yet been uploaded to the long-term storage could be lost. There are the following ways to mitigate this failure mode:

Replication

Replication

By default, each profile series is replicated to three ingesters. Writes to the Pyroscope cluster are successful if a quorum of ingesters received the data, which is a minimum of 2 with a replication factor of 3. If the Pyroscope cluster loses an ingester, the in-memory profiles held by the head block of the lost ingester are available at least in one other ingester. In the event of a single ingester failure, no profiles are lost. If multiple ingesters fail, profiles might be lost if the failure affects all the ingesters holding the replicas of a specific profile series.

Pyroscope querier

Wed, 08 Apr 2026 14:38:28 +0000

Pyroscope querier

The querier is a stateless component that evaluates query expressions by fetching profiles series and labels on the read path.

The querier uses the ingesters for gathering recently written data and the [store-gateways] for the long-term storage.

Connecting to ingesters

You must configure the querier with the same -ingester.ring.* flags (or their respective YAML configuration parameters) that you use to configure the ingesters so that the querier can access the ingester hash ring and discover the addresses of the ingesters.

Querier configuration

For details about querier configuration, refer to querier.

Pyroscope Store-gateway

Wed, 08 Apr 2026 14:38:28 +0000

Pyroscope Store-gateway

The store-gateways in Pyroscope are responsible for looking up profiling data in the long-term storage bucket. A single store-gateway is responsible for a subset of the blocks in the long-term storage and will be involved by a [querier].

Store-gateway configuration

For details about store-gateway configuration, refer to store-gateway.

Pyroscope query-frontend

Wed, 08 Apr 2026 14:38:28 +0000

Pyroscope query-frontend

The query-frontend is a stateless component that provides the same API as the querier and can be used to accelerate the read path and ensure fair scheduling between tenants using the query-scheduler.

In this situation, queriers act as workers that pull jobs from the queue, execute them, and return the results to the query-frontend for aggregation.

We recommend that you run at least two query-frontend replicas for high-availability reasons.

Because the query-scheduler is a mandatory component when using the query-frontend, you must run at least one query-scheduler replica.

The following steps describe how a query moves through the query-frontend.

A query-frontend receives a query.
The query-frontend places the query in a queue by communicating with the query-scheduler, where it waits to be picked up by a querier.
A querier picks up the query from the queue and executes it.
A querier or queriers return the result to query-frontend, which then aggregates and forwards the results to the client.

Pyroscope query-scheduler

Wed, 08 Apr 2026 14:38:28 +0000

Pyroscope query-scheduler

The query-scheduler is a stateless component that retains a queue of queries to execute, and distributes the workload to available queriers.

The query-scheduler is a required component when using the query-frontend.

The following flow describes how a query moves through a Pyroscope cluster:

The query-frontend receives queries, and then either splits and shards them, or serves them from the cache.
The query-frontend enqueues the queries into a query-scheduler.
The query-scheduler stores the queries in an in-memory queue where they wait for a querier to pick them up.
Queriers pick up the queries, and executes them.
The querier sends results back to query-frontend, which then forwards the results to the client.

Benefits of using the query-scheduler

Query-scheduler enables the scaling of query-frontends. To learn more, see Mimir Query Frontend documentation.

Configuration

To use the query-scheduler, query-frontends and queriers need to discover the addresses of query-scheduler instances. To advertise itself, the query-scheduler uses Ring-based service discovery which is configured via the memberlist configuration.

Operational considerations

For high-availability, run two query-scheduler replicas.