Pyroscope v2 architecture on Grafana Labs

Design motivation

Mon, 20 Apr 2026 09:02:32 +0000

Design motivation

The v2 architecture addresses fundamental scalability and resilience limitations in v1 that cannot be resolved incrementally.

Write path limitations in v1

No write-ahead log (WAL): Ingesters accumulate profiles in memory and periodically flush them to disk, but there is no WAL to durably record writes on arrival. If an ingester crashes between flushes, the in-memory profiles are lost. Replication mitigates this but cannot fully prevent data loss when multiple ingesters fail.
Deduplication overhead: In v1, each profile series is replicated to N ingesters, and each ingester writes its own block. At query time, these duplicates have to be merged and deduplicated. This becomes increasingly expensive as the number of ingesters grows.
Weak read/write isolation: Ingestion latency spikes can cause distributor out-of-memory (OOM) errors. Expensive queries can increase ingestion latency due to broad locks, and can themselves cause ingester OOM.
Suboptimal data distribution: The label-hash-based sharding distributes profiles of the same service across all ingesters, causing excessive duplication of symbolic information and reducing query selectivity.
Slow rollouts: Ingester rollouts can take hours in large deployments due to the need to flush in-memory data before shutdown.

Read path limitations in v1

Store-gateway instability: Heavy queries can cause store-gateway OOM. The block index overhead grows with the number of blocks, putting memory pressure on store-gateways.
Limited elasticity: The querier and store-gateway services are difficult to scale dynamically, as store-gateways need to load block indexes before serving queries.
Slow rollouts: Like ingesters, store-gateway rollouts can be slow due to the need to load block indexes on startup.

Compaction limitations in v1

Scalability: The v1 compactor can struggle to keep up with large tenants as data is replicated during ingestion. Delays in compaction place pressure on the read path, as queries have to process and deduplicate more uncompacted blocks.

Extensibility

Adding a new data access method (for example, a new API endpoint for heatmaps) in v1 requires changes across many components. The tight coupling between ingesters, store-gateways, and queriers makes the codebase harder to maintain and extend.

Comparison with v1

Aspect	v1	v2
Write path	Distributor → Ingester → Object Storage	Distributor → Segment writer → Object storage + Metastore
Metadata	Per-tenant bucket index in object storage	Metastore (Raft-based, in-memory index)
Read path	Query frontend → Query scheduler → Querier → Ingester / Store-gateway	Query frontend → Metastore + Query backend
Compaction	Compactor (hash-ring sharded, per-tenant)	Compaction worker orchestrated by metastore
Replication	Write replication to N ingesters	No write replication; durability via object storage

v2 addresses these issues by eliminating write replication in favor of object storage durability, centralizing metadata in the metastore for fast query planning and enabling stateless query backends that access object storage directly, and decoupling compaction into a more scalable job-based system orchestrated by the metastore.

For details on how the v2 architecture works, refer to About the Pyroscope v2 architecture.

About the Pyroscope v2 architecture

Mon, 20 Apr 2026 09:02:32 +0000

About the Pyroscope v2 architecture

Note
The Pyroscope v2 architecture is production-ready and powers Grafana Cloud Profiles exclusively. However, until it’s released by default as part of Pyroscope v2.0, there are no API stability guarantees.

Pyroscope v2 is a complete architectural redesign focused on improving scalability, performance, and cost-efficiency. The architecture is built around the following goals:

Deliver high write throughput
Provide cost-effective storage
Enable scalable query performance
Reduce operational overhead

For background on the v1 limitations that motivated this redesign, refer to Design motivation.

Key design changes

The biggest change in Pyroscope v2 is how it handles storage: data is written directly to object storage, removing the need for local disks in ingesters. For single-node deployments, local file systems can still be used as object storage, but this setup isn’t supported in microservice mode.

Pyroscope v2 also decouples the write and query paths. This means each path can scale independently, so even the heaviest queries won’t interfere with ingestion performance. The read path can scale to hundreds of instances instantly.

Architecture overview

The high-level components of the architecture include:

graph TD

    subgraph entry_points[" "]
        ingest_entry["Ingest Path"]:::entry_ingest --> distributor
        query_entry["Query Path"]:::entry_query --> query_frontend
    end

    distributor -->|writes to| segment_writer
    segment_writer -->|updates| metastore
    segment_writer -->|creates segments| object_storage

    metastore -->|coordinates| compaction_worker
    compaction_worker -->|compacts| object_storage

    query_frontend -->|invokes| query_backend
    query_backend -->|reads from| object_storage
    query_frontend -->|queries| metastore

    distributor["distributor"]
    segment_writer["segment-writer"]
    metastore["metastore"]
    compaction_worker["compaction-worker"]
    query_backend["query-backend"]
    query_frontend["query-frontend"]

    subgraph object_storage["object storage"]
        segments
        blocks
    end

    linkStyle 0 stroke:#a855f7,stroke-width:2px
    linkStyle 1 stroke:#3b82f6,stroke-width:2px
    linkStyle 2,3,4 stroke:#a855f7,stroke-width:2px
    linkStyle 6 stroke:#a855f7,stroke-width:2px
    linkStyle 7,8,9 stroke:#3b82f6,stroke-width:2px

    classDef entry_ingest stroke:#a855f7,stroke-width:2px,font-weight:bold
    classDef entry_query stroke:#3b82f6,stroke-width:2px,font-weight:bold

Pyroscope v2 components

Most components in v2 are stateless and don’t require any data persisted between process restarts. The metastore is the only stateful component, using Raft consensus for replication. For details about each component, refer to Components.

The write path

Profiles are ingested through the Push RPC API and HTTP /ingest API to distributors. The write path includes distributor and segment-writer services: both are stateless, disk-less, and scale horizontally with high efficiency.

Profile ingest requests are distributed among distributors, which then route them to segment-writers to co-locate profiles from the same application. This ensures that profiles likely to be queried together are stored together.

The segment-writer service accumulates profiles in small blocks (segments) and writes them to object storage while updating the block index with metadata of newly added objects. Each writer produces a single object per shard containing data of all tenant services per shard; this approach minimizes the number of write operations to the object storage, optimizing the cost of the solution.

Ingestion clients are blocked until data is durably stored in object storage and an entry for the object is created in the metadata index. By default, ingestion is synchronous, with median latency expected to be less than 500ms using default settings.

The read path

Profiling data is queried through the Query API available in the query-frontend service.

A regular flame graph query users see in the UI may require fetching many gigabytes of data from storage. Moreover, the raw profiling data needs expensive post-processing to be displayed in flame graph format. Pyroscope addresses this challenge through adaptive data placement that minimizes the number of objects that need to be read to satisfy a query, and high parallelism in query execution.

The query frontend is responsible for preliminary query planning and routing the query to the query-backend service. Data objects are located using the metastore service, which maintains the metadata index.

Queries are executed by the query-backend service with high parallelism. Query execution is represented as a graph where the results of sub-queries are combined and optimized. This minimizes network overhead and enables horizontal scalability of the read path without needing traditional disk-based solutions or even a caching layer.

Both query-frontend and query-backend are stateless services that can scale out to hundreds of instances.

Compaction

The number of objects created in storage can reach millions per hour. This can severely degrade query performance due to high read amplification and excessive calls to object storage. Additionally, a high number of metadata entries can degrade performance across the entire cluster, impacting the write path as well.

To ensure high query performance, data objects are compacted in the background. The compaction-worker service is responsible for merging small segments into larger blocks, which are then written back to object storage. Compaction workers compact data as soon as possible after it’s written to object storage, with median time to the first compaction not exceeding 15 seconds.

Compaction workers are coordinated by the metastore service, which maintains the metadata index and schedules compaction jobs. Compaction workers are stateless and don’t require any local storage.

For more details, refer to Compaction.

Object storage

Pyroscope v2 is designed to operate without local disks, relying entirely on object storage. This approach minimizes operational overhead and cost.

Pyroscope requires any of the following object stores for block files:

Pyroscope v2 components

Mon, 20 Apr 2026 09:02:32 +0000

Pyroscope v2 components

Pyroscope v2 includes a set of components that interact to form a cluster.

Most components are stateless and don’t require any data persisted between process restarts. The metastore is the only stateful component in the architecture, using Raft consensus for replication and fault tolerance.

Block format

Mon, 20 Apr 2026 09:02:32 +0000

Block format

In Pyroscope v2, a block is a single object in object storage (block.bin) containing data from one or more datasets. Each dataset holds profiling data for a specific service and includes its own TSDB index, symbol data, and profile tables. Block metadata — stored in the metastore and embedded in the object itself — describes the datasets, their labels, and byte offsets within the object.

Object storage layout

Segments (level 0, not yet compacted) and compacted blocks are stored in separate top-level directories. Segments are not yet split by tenant and use an anonymous tenant directory. After compaction, blocks are organized by tenant:

segments/
  {shard}/
    anonymous/
      {block_id}/
        block.bin

blocks/
  {shard}/
    {tenant}/
      {block_id}/
        block.bin

dlq/
  {shard}/
    {tenant}/
      {block_id}/
        block.bin

Block structure

Each block.bin object contains a sequence of datasets followed by a metadata footer:

Offset    | Content
----------|-------------------------------------------
0         | Dataset 0 data
          | Dataset 1 data
          | ...
          | Dataset N data
          | Protobuf-encoded block metadata
end-8     | uint32 (big-endian): raw metadata size
end-4     | uint32 (big-endian): CRC32 of metadata + size

Datasets

A dataset is a self-contained region within the block that stores profiling data for a specific service. Each dataset contains:

A TSDB index mapping series labels to profiles
Symbol data (symbols.symdb) for stack traces and function names
A Parquet table of profile samples

Datasets are annotated with labels (such as service_name and profile_type) that allow the query path to select only the relevant datasets without reading the entire block.

A separate tenant-wide dataset index allows queries that don’t target a specific service to locate the relevant datasets.

Block metadata

Block metadata is a protobuf-encoded structure that describes the block’s contents:

Block ID (ULID), tenant, shard, compaction level, and time range
A list of datasets with their byte offsets (table of contents), labels, and sizes
A string table for deduplicating strings across the metadata entry

The metadata is stored both in the metastore index and embedded in the block object itself.

Pyroscope v2 deployment modes

Mon, 20 Apr 2026 09:02:32 +0000

Pyroscope v2 deployment modes

Pyroscope v2 can be deployed in different configurations depending on your scale and operational requirements.

Microservices mode

In microservices mode, each component runs as a separate process. This is the recommended deployment for production environments at scale.

Benefits

Independent scaling: Scale each component based on its specific load
Fault isolation: Component failures don’t affect other components
Resource optimization: Allocate resources based on component needs
Rolling updates: Update components independently

Components to deploy

Component	Instances	Stateful	Notes
Distributor	2+	No	Scale based on ingestion rate
Segment-writer	2+	No	Scale based on ingestion rate
Metastore	3 or 5	Yes	Odd number for Raft consensus
Compaction-worker	2+	No	Scale based on compaction backlog
Query-frontend	2+	No	Scale based on query load
Query-backend	2+	No	Scale based on query load

Object storage requirement

Microservices mode requires object storage (Amazon S3, Google Cloud Storage, Azure Blob Storage, or OpenStack Swift). Local filesystem storage is not supported in this mode.

Single-node mode

For evaluation, development, or small-scale deployments, Pyroscope v2 can run as a single process with all components enabled.

Benefits

Simple deployment: Single binary to run
Lower resource requirements: Suitable for smaller workloads
Local storage option: Can use local filesystem for storage

Limitations

No high availability: Single point of failure
Limited scalability: Cannot scale individual components
Not recommended for production: Use microservices mode for production workloads

Kubernetes deployment

For Kubernetes deployments, use the Helm chart with v2 storage enabled.

Single-binary mode

helm install pyroscope grafana/pyroscope --version 1.20.3 \
  --set architecture.storage.v1=false \
  --set architecture.storage.v2=true

Microservices mode

helm install pyroscope grafana/pyroscope --version 1.20.3 \
  --set architecture.microservices.enabled=true \
  --set architecture.storage.v1=false \
  --set architecture.storage.v2=true

For migrating an existing v1 deployment, refer to the migration guide.

Helm chart considerations

When deploying on Kubernetes:

Configure persistent volumes for metastore nodes
Set up object storage credentials
Configure resource requests and limits for each component
Set up ingress for distributor and query-frontend

Storage configuration

Object storage

Pyroscope v2 supports the following object storage backends:

Amazon S3: Recommended for AWS deployments
Google Cloud Storage: Recommended for GCP deployments
Azure Blob Storage: Recommended for Azure deployments
OpenStack Swift: For OpenStack environments

Local filesystem

Local filesystem storage is only supported for single-node deployments and is not suitable for production use in microservices mode.

Resource planning

Metastore

The metastore is the only component requiring persistent storage:

Disk space: A few gigabytes, even at large scale
Memory: Benefits from keeping the index in memory
CPU: Moderate usage for Raft consensus operations

Stateless components

All other components are stateless and primarily need:

CPU: For data processing
Memory: For in-flight data and query execution
Network: For object storage access

Object storage

Plan for object storage costs based on:

Write operations: Segment flushes and compaction uploads
Read operations: Query execution
Storage: Retained profile data

Data distribution

Mon, 20 Apr 2026 09:02:32 +0000

Data distribution

Pyroscope v2 uses a sophisticated data distribution algorithm to place profiles across segment-writers. The algorithm ensures that profiles from the same application are co-located while maintaining even load distribution across the cluster.

Design goals

The distribution algorithm is designed to achieve:

Data co-location: Profiles from the same tenant service are stored together
Query performance: Co-located data reduces the number of objects needed for queries
Compaction efficiency: Related data can be compacted more effectively
Even distribution: Load is balanced across segment-writers
Minimal re-balancing: Changes to the cluster minimize data movement

Three-step placement

The choice of placement for a profile involves a three-step process:

Tenant shards: Find m suitable locations from the total N shards using the tenant_id.
Dataset shards: Find n suitable locations from m options using the service_name label.
Final placement: Select the exact shard s from n options.

Where:

N is the total number of shards in the deployment
m (tenant shard limit) is configured explicitly
n (dataset shard limit) is selected dynamically based on observed ingestion rate

block-beta
    columns 15

    shards["ring"]:2
    space
    shard_0["0"]
    shard_1["1"]
    shard_2["2"]
    shard_3["3"]
    shard_4["4"]
    shard_5["5"]
    shard_6["6"]
    shard_7["7"]
    shard_8["8"]
    shard_9["9"]
    shard_10["10"]
    shard_11["11"]

    tenant["tenant"]:2
    space:4
    ts_3["3"]
    ts_4["4"]
    ts_5["5"]
    ts_6["6"]
    ts_7["7"]
    ts_8["8"]
    ts_9["9"]
    space:2

    dataset["dataset"]:2
    space:5
    ds_4["4"]
    ds_5["5"]
    ds_6["6"]
    ds_7["7"]
    space:4

In this example:

The tenant’s shard range starts at offset 3 with size 8
The dataset’s shard range is a subset within the tenant’s range, starting at offset 1 with 4 shards

Consistent hashing

Pyroscope uses Jump consistent hash to select positions within each subring. This algorithm ensures:

Balance: Objects are evenly distributed among buckets
Monotonicity: When buckets are added, objects only move from old to new buckets

This minimizes data re-balancing when the cluster size changes.

Hot spot mitigation

To prevent hot spots where many datasets end up on the same node, shards are mapped to instances through a separate mapping table. This mapping:

Ensures even distribution across nodes
Is updated when nodes are added or removed
Preserves existing mappings as much as possible

graph LR
    Distributor==>SegmentWriter
    PlacementAgent-.-PlacementRules
    SegmentWriter-->|metadata|PlacementManager
    SegmentWriter==>|data|Segments
    PlacementManager-.->PlacementRules

    subgraph Distributor["distributor"]
        PlacementAgent
    end

    subgraph Metastore["metastore"]
        PlacementManager
    end

    subgraph ObjectStore["object store"]
        PlacementRules(placement rules)
        Segments(segments)
    end

    subgraph SegmentWriter["segment-writer"]
    end

Adaptive load balancing

Due to the nature of continuous profiling, data can be distributed unevenly across profile series. To mitigate this:

By default: fingerprint mod n is used as the distribution key
When skew is detected: Switches to random(n) distribution

This adaptive approach handles uneven data distribution while maintaining locality when possible.

Placement management

The Placement Manager runs on the metastore leader and:

Tracks dataset statistics from segment-writer metadata
Builds placement rules at regular intervals
Determines the number of shards for each dataset
Decides the load balancing strategy (fingerprint mod vs round robin)

Placement rules are stored in object storage and fetched by distributors. Since actual data re-balancing is not performed, placement rules don’t need to be synchronized in real-time.

Failure handling

If a segment-writer fails:

The distributor selects the next suitable segment-writer from available options.
The shard identifier is specified explicitly in the request.
Data locality is maintained even during transient failures.

Two requests with the same distribution key may occasionally end up in different shards, but this is expected to be rare.

Implementation details

For detailed implementation information, including the full algorithm specification and shard mapping procedures, refer to the internal documentation.

Compaction

Mon, 20 Apr 2026 09:02:32 +0000

Compaction

Compaction is the process of merging multiple small segments into larger, optimized blocks. This is essential for maintaining query performance and controlling metadata index size.

Why compaction matters

The ingestion pipeline creates many small segments—potentially millions of objects per hour at scale. Without compaction:

Read amplification: Queries must fetch many small objects
API costs: More calls to object storage
Metadata bloat: The metastore index grows unboundedly
Performance degradation: Impacts both read and write paths

How it works

Compaction in Pyroscope v2 is coordinated by the metastore and executed by compaction-workers.

sequenceDiagram
    participant W as Compaction Worker
    participant M as Metastore
    participant S as Object Storage

    loop Continuous
        W->>M: Poll for jobs
        M->>W: Assign job with source blocks
        W->>S: Download source segments
        W->>W: Merge segments into block
        W->>S: Upload compacted block
        W->>M: Report completion
        M->>M: Update metadata index
    end

Compaction service

The compaction service runs within the metastore and is responsible for:

Job planning: Creating compaction jobs when enough segments are available
Job scheduling: Assigning jobs to workers based on capacity
Job tracking: Monitoring progress and handling failures
Index updates: Replacing source block entries with compacted block entries

Raft consistency

The compaction service relies on Raft to guarantee consistency:

Plan preparation: The leader prepares job state changes (read-only).
Plan proposal: Changes are committed to the Raft log.
State update: All replicas apply the changes atomically.

This ensures all replicas maintain consistent views of compaction state.

Job planner

The job planner maintains a queue of blocks eligible for compaction:

Queue structure: FIFO queue, segmented by tenant, shard, and level
Job creation: Jobs are created when enough blocks are queued
Boundaries: Compaction never crosses tenant, shard, or level boundaries

Data layout

Profiling data from each service is stored as a separate dataset within a block. During compaction:

Matching datasets from source blocks are merged
TSDB indexes are combined
Symbols and profile tables are merged and rewritten
Output block contains optimized, non-overlapping datasets

Job scheduler

The scheduler uses a Small Job First strategy:

Lower-level blocks are prioritized (smaller, affect read amplification more).
Within a level, unassigned jobs are processed first.
Jobs with fewer failures are prioritized.
Jobs with earlier lease expiration are considered first.

Adaptive capacity

Workers specify available capacity when polling for jobs. The scheduler:

Creates jobs based on reported worker capacity
Balances queue size with worker utilization
Adapts to available resources automatically

Job ownership

Jobs are assigned using a lease-based model:

Lease duration: Workers are granted ownership for a limited time
Fencing tokens: Raft log index serves as a unique token
Lease refresh: Workers must refresh leases before expiration
Reassignment: Expired leases allow job reassignment

Failure handling

When a worker fails:

The job lease expires.
The metastore detects the expired lease.
The job is reassigned to another worker.
Source blocks remain until compaction succeeds.

Jobs that repeatedly fail are deprioritized to prevent blocking the queue.

Job status lifecycle

stateDiagram-v2
    [*] --> Unassigned : Create Job
    Unassigned --> InProgress : Assign Job
    InProgress --> Success : Job Completed
    InProgress --> LeaseExpired: Job Lease Expires
    LeaseExpired: Abandoned Job

    LeaseExpired --> Excluded: Failure Threshold Exceeded
    Excluded: Faulty Job

    Success --> [*] : Remove Job from Schedule
    LeaseExpired --> InProgress : Reassign Job

Performance characteristics

Median time to first compaction: Less than 15 seconds
Continuous operation: Workers constantly poll for new jobs
Horizontal scaling: Add more workers to handle compaction backlog
Priority-based: Smaller blocks compacted first for fastest impact

Block deletion

After successful compaction:

Tombstone creation: Source blocks are marked for deletion.
Delay period: Blocks are retained to allow in-flight queries to complete.
Hard deletion: After the delay, source blocks are removed from storage.

This two-phase deletion prevents query failures during compaction.

Implementation details

For detailed implementation information, including job scheduling algorithms and lease management, refer to the internal documentation.

Metadata index

Mon, 20 Apr 2026 09:02:32 +0000

Metadata index

The metadata index stores information about all data objects (blocks and segments) in object storage. It is maintained by the metastore service and provides fast lookups for query planning.

Purpose

The metadata index enables:

Block discovery: Finding blocks that match a query’s time range and filters
Query planning: Identifying exactly which objects need to be read
Compaction coordination: Tracking which blocks can be compacted together
Retention enforcement: Managing block lifecycle and cleanup

Implementation

The index is implemented using:

BoltDB: Key-value store for metadata entries
Raft: Consensus protocol for replication and consistency

BoltDB was chosen for its simplicity and efficiency with a single writer and concurrent readers. For better performance, the index can be stored on an in-memory volume since it’s recovered from the Raft log on startup.

Block metadata

Each block in object storage has a corresponding metadata entry containing:

Block ID: Unique identifier (ULID) based on creation time
Tenant: The tenant that owns the data
Shard: The shard assignment for data distribution
Time range: Start and end timestamps of the data
Datasets: Information about contained datasets (services)

Dataset information

Each dataset within a block includes:

Service name: The service_name label identifying the application
Labels: Additional metadata labels for filtering
Table of contents: Offsets to data sections within the dataset

Index structure

The index is partitioned by time, with each partition covering a 6-hour window:

Partition (6h window)
├── Tenant A
│   ├── Shard 0
│   ├── Shard 1
│   └── Shard N
└── Tenant B
    ├── Shard 0
    └── Shard N

Within each shard:

Block entries: Key-value pairs (block ID → metadata)
String table: Deduplicated strings for space efficiency
Shard index: Time range for efficient filtering

Index writes

Index writes are performed by segment-writers when new segments are created:

sequenceDiagram
    participant SW as segment-writer
    participant M as Metastore
    participant R as Raft
    participant I as Index

    SW->>M: AddBlock(metadata)
    M->>R: Propose ADD_BLOCK
    R->>R: Commit to log
    R->>I: Insert block
    I-->>R: Success
    R-->>M: Committed
    M-->>SW: Success

Tombstone protection

Before adding a block, the index checks for tombstones to prevent re-adding blocks that were already compacted. This handles cases where:

A writer’s response was lost but the block was added
The block was already compacted before the retry

Index queries

Queries use the linearizable read pattern to ensure consistency:

Read index request: Query asks for the current commit index.
Leader check: Verifies the current leader.
Wait for commit: Waits until the commit index is applied locally.
Read state: Reads from the local state machine.

This allows both leader and follower replicas to serve queries while ensuring they see the latest committed state.

Query types

The index supports two main query patterns:

Metadata queries: Find blocks matching criteria

Query:
  - Time range: [start, end]
  - Tenant: ["tenant-1"]
  - Labels: {service_name="frontend"}

Label queries: List available labels without reading data

Query:
  - Return: distinct values for "profile_type" label
  - Filter: {service_name="frontend"}

Retention

Compaction-based retention

When blocks are compacted:

Source block entries are replaced with compacted block entry.
Tombstones are created for source blocks.
Tombstones trigger eventual deletion of source objects.

Time-based retention

Retention policies delete entire partitions based on:

Block creation time: Primary retention criteria
Data timestamps: Blocks are only deleted if data is also past retention

Retention policies are tenant-specific and configurable per tenant.

Cleanup process

The cleaner runs on the Raft leader and:

Lists partitions and applies retention policy.
Identifies partitions to delete.
Proposes deletion to Raft.
Creates tombstones for affected blocks.
Tombstones are processed during compaction.

sequenceDiagram
    participant C as Cleaner
    participant M as Metastore
    participant R as Raft
    participant I as Index

    C->>M: TruncateIndex(policy)
    M->>I: List partitions
    I-->>M: Partition list
    M->>M: Apply retention policy
    M->>R: Propose TRUNCATE_INDEX
    R->>I: Delete partitions
    R->>I: Add tombstones
    R-->>M: Committed
    M-->>C: Success

Performance

Caching

The index uses several caches:

Shard cache: Keeps shard indexes and string tables in memory
Block cache: Stores decoded metadata entries

Scalability

Storage requirements: A few gigabytes even at large scale
Query performance: Sub-millisecond lookups with caching
Write throughput: Limited by Raft consensus, typically sufficient for ingestion rates

Implementation details

For detailed implementation information, including the protobuf schema and internal structures, refer to the internal documentation.

Migrate from v1 to v2 storage using Helm

Mon, 20 Apr 2026 09:02:32 +0000

Migrate from v1 to v2 storage using Helm

This guide walks you through migrating a Pyroscope installation from v1 to v2 storage architecture using the Helm chart. The migration uses a phased approach that lets you run both storage backends simultaneously before fully cutting over to v2.

For an overview of what changed in v2 and why, refer to About the v2 architecture and Design motivation.

Prerequisites

Before starting the migration, make sure you have:

Helm chart version 1.19.2 or later. Verify with:
Bash
```
helm list -n pyroscope -f pyroscope
```
Check that the CHART column shows pyroscope-1.19.2 or higher. If your chart is older, upgrade it first.
Pyroscope running on v1 storage via Helm. Verify with:
Bash
```
helm get values -n pyroscope pyroscope -o yaml --all | grep -A8 'storage:' | grep -E 'v1:|v2:'
```
You should see v1: true and v2: false. If you see v2: true, your installation is already using v2 or is mid-migration.
Object storage configured. v2 writes directly to object storage — it doesn’t use local disk for block storage. If you haven’t configured object storage yet, add it to your Helm values. For example, for S3:
YAML
```
pyroscope:
  structuredConfig:
    storage:
      backend: s3
      s3:
        endpoint: s3.us-east-1.amazonaws.com
        bucket_name: pyroscope-data
        access_key_id: "${AWS_ACCESS_KEY_ID}"
        secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
```
For other backends (GCS, Azure, Swift), refer to Configure object storage backend. You can also use the filesystem backend, but in Kubernetes, this requires a ReadWriteMany volume that is shared across all pods.
kubectl and helm CLI access to your cluster.

Note
The examples in this guide assume Pyroscope is installed in the pyroscope namespace with the release name pyroscope. Adjust the -n namespace flag and release name in helm and kubectl commands if your installation differs.

Migration overview

The migration has three phases:

Phase	What happens	Reversible?
1. Dual ingest	v2 components deploy alongside v1. Writes go to both storage backends.	Yes
2. Validate	Run both backends for at least 24 hours. Verify v2 data and compaction.	Yes
3. Remove v1	Remove v1 components. Only v2 serves reads and writes.	Partial

The steps below are specific to your deployment mode. Follow the section that matches your installation.

Single-binary mode

Phase 1: Enable dual ingest

In this phase, the single-binary process enables the v2 storage modules alongside v1. Writes go to both storage backends simultaneously, and the read path serves data from both v1 and v2.

helm upgrade -n pyroscope pyroscope grafana/pyroscope --version 2.0.0 \
  --reset-then-reuse-values \
  --set architecture.storage.v1=true \
  --set architecture.storage.v2=true

The --reuse-values flag preserves your existing configuration. Alternatively, you can pass your values file with -f values.yaml.

Verify Phase 1

After the upgrade completes, check that the pod has restarted and is running:

kubectl get pods -n pyroscope -l app.kubernetes.io/instance=pyroscope

Check the Helm release notes for migration status:

helm get notes -n pyroscope pyroscope

You should see output similar to:

# Pyroscope v2 Migration is active

Write traffic will be written to:
- 100% v1: ingester
- 100% v2: segment-writer

Read traffic is served from v2 read path from as soon as data was first ingested to v2.

Also verify:

The metastore raft has initialized. Check the pod logs for a message like entering leader state:

kubectl logs -n pyroscope -l app.kubernetes.io/instance=pyroscope --tail=500 | grep -i "entering leader state"

The segment-writer ring is healthy:

kubectl port-forward -n pyroscope svc/pyroscope 4040:4040 &
PF_PID=$!
sleep 2
curl -s http://localhost:4040/ring-segment-writer | grep -o 'ACTIVE' | wc -l
kill $PF_PID

In single-binary mode, the count should be 1.

Phase 2: Validate v2 is working

Run both storage backends simultaneously for at least 24 hours before proceeding. During this time, you should be able to query data ingested to v2.

Verify data is being written to v2

Query recent profiling data. The v2 read path should serve data ingested after Phase 1. You can use profilecli, the Pyroscope UI, or the API to query profiles from the last hour and confirm results are returned:

kubectl port-forward -n pyroscope svc/pyroscope 4040:4040 &
PF_PID=$!
sleep 2
profilecli query series --url http://localhost:4040 --from "now-1h" --to "now"
kill $PF_PID

You should see series labels for the profiling data being ingested. If no results are returned, check the distributor and segment-writer logs for errors.

Verify v2 compaction is running

The compaction-worker compacts segments through the L0 → L1 → L2 levels. Verify that compaction jobs are completing:

kubectl logs -n pyroscope -l app.kubernetes.io/instance=pyroscope --tail=500 | grep "compaction finished successfully"

You should see log lines like:

msg="compaction finished successfully" input_blocks=20 output_blocks=1

Compaction typically starts within minutes of ingestion, the first block is created once enough segments accumulate for a shard.

Verify error rates are stable

Check that write and read error rates haven’t increased since enabling v2. If you have Prometheus metrics configured:

sum(rate(pyroscope_request_duration_seconds_count{status_code=~"5.."}[5m]))

Error rates should be zero or negligible. Compare against pre-migration baselines to confirm no regression.

Phase 3: Switch to v2 storage

Once you’re confident that v2 is working correctly and you no longer need to query data ingested before Phase 1, you can disable the v1 storage.

Warning
After this step, data ingested before Phase 1 is no longer queryable through Pyroscope. The data still exists in object storage, but the v1 storage modules that serve it will be disabled. Make sure you don’t need to query historical data from before the migration started.

helm upgrade -n pyroscope pyroscope grafana/pyroscope --version 2.0.0 \
  --reset-then-reuse-values \
  --set architecture.storage.v1=false \
  --set architecture.storage.v2=true

Verify Phase 3

Verify that the Pyroscope pod has restarted:

kubectl get pods -n pyroscope -l app.kubernetes.io/instance=pyroscope

Verify that queries still return data:

kubectl port-forward -n pyroscope svc/pyroscope 4040:4040 &
PF_PID=$!
sleep 2
profilecli query series --url http://localhost:4040 --from "now-1h" --to "now"
kill $PF_PID

You should see series labels for recent profiling data. You can also open the Pyroscope UI at http://localhost:4040 and verify that you can query recent profiles. An empty or errored UI indicates a problem — see Rollback.

Microservices mode

If you deployed Pyroscope using the values-micro-services.yaml file as described in Deploy on Kubernetes, follow the steps below.

Phase 1: Deploy v2 components alongside v1

In this phase, you deploy the v2 components (segment-writer, metastore, compaction-worker, query-backend) alongside your existing v1 installation. The distributor starts writing to both storage backends simultaneously. The read path serves data from both v1 and v2.

helm upgrade -n pyroscope pyroscope grafana/pyroscope \
  --reuse-values \
  --set architecture.microservices.enabled=true \
  --set architecture.storage.v1=true \
  --set architecture.storage.v2=true

The --reuse-values flag preserves your existing configuration. Alternatively, you can pass your values file with -f values.yaml.

Verify Phase 1

After the upgrade completes, check that the new components are running:

kubectl get pods -n pyroscope -l app.kubernetes.io/instance=pyroscope

Check the Helm release notes for migration status:

helm get notes -n pyroscope pyroscope

You should see output similar to:

# Pyroscope v2 Migration is active

Write traffic will be written to:
- 100% v1: ingester
- 100% v2: segment-writer

Read traffic is served from v2 read path from as soon as data was first ingested to v2.

Also verify:

The metastore raft cluster has elected a leader:

kubectl logs -n pyroscope -l app.kubernetes.io/component=metastore --tail=500 | grep -i "entering leader state"

The segment-writer ring is healthy. All instances should show as ACTIVE:

kubectl port-forward -n pyroscope svc/pyroscope-distributor 4040:4040 &
PF_PID=$!
sleep 2
curl -s http://localhost:4040/ring-segment-writer | grep -o 'ACTIVE' | wc -l
kill $PF_PID

The count should match the number of segment-writer instances.

Phase 2: Validate v2 is working

Run both storage backends simultaneously for at least 24 hours before proceeding. During this time, you should be able to query data ingested to v2.

Verify data is being written to v2

kubectl port-forward -n pyroscope svc/pyroscope-query-frontend 4040:4040 &
PF_PID=$!
sleep 2
profilecli query series --url http://localhost:4040 --from "now-1h" --to "now"
kill $PF_PID

You should see series labels for the profiling data being ingested. If no results are returned, check the distributor and segment-writer logs for errors.

Verify v2 compaction is running

The compaction-worker compacts segments through the L0 → L1 → L2 levels. Verify that compaction jobs are completing:

kubectl logs -n pyroscope -l app.kubernetes.io/component=compaction-worker --tail=500 | grep "compaction finished successfully"

You should see log lines like:

msg="compaction finished successfully" input_blocks=20 output_blocks=1

Compaction typically starts within minutes of ingestion, the first block is created once enough segments accumulate for a shard.

Verify error rates are stable

Check that write and read error rates haven’t increased since enabling v2. If you have Prometheus metrics configured, query error rates per component:

# Server-side errors by component (distributor, segment-writer, query-frontend, query-backend, etc.)
sum by (component) (rate(pyroscope_request_duration_seconds_count{status_code=~"5.."}[5m]))

All components should show zero or negligible error rates. Compare against pre-migration baselines to confirm no regression.

Phase 3: Remove v1 components

Once you’re confident that v2 is working correctly and you no longer need to query data ingested before Phase 1, you can remove the v1 components.

Warning
After this step, data ingested before Phase 1 is no longer queryable through Pyroscope. The data still exists in object storage, but the v1 read path components (ingester, store-gateway, querier) that serve it will be removed. Make sure you don’t need to query historical data from before the migration started.

helm upgrade -n pyroscope pyroscope grafana/pyroscope \
  --reuse-values \
  --set architecture.storage.v1=false \
  --set architecture.storage.v2=true

The Helm chart automatically removes v1-only components (ingester, compactor, store-gateway, querier, query-scheduler) when architecture.storage.v1 is set to false, even if your values file or --reuse-values state still contains overrides for those components.

Verify Phase 3

Check that v1 components have been removed and v2 is serving all traffic:

# v1 components (ingester, store-gateway, querier, compactor, query-scheduler) should be gone
kubectl get pods -n pyroscope -l app.kubernetes.io/instance=pyroscope

Verify that queries still return data:

kubectl port-forward -n pyroscope svc/pyroscope-query-frontend 4040:4040 &
PF_PID=$!
sleep 2
profilecli query series --url http://localhost:4040 --from "now-1h" --to "now"
kill $PF_PID

Rollback

During Phase 1 or Phase 2

Rolling back is straightforward — set architecture.storage.v2=false to remove the v2 components and return to v1-only:

helm upgrade -n pyroscope pyroscope grafana/pyroscope \
  --reuse-values \
  --set architecture.storage.v1=true \
  --set architecture.storage.v2=false

Data written to v2 during the dual-ingest period is orphaned but doesn’t affect v1 operation.

During or after Phase 3

If you removed v1 components (Phase 3), rolling back requires redeploying them:

helm upgrade -n pyroscope pyroscope grafana/pyroscope \
  --reuse-values \
  --set architecture.storage.v1=true \
  --set architecture.storage.v2=true

This returns you to dual-ingest mode (Phase 1). Note that any data ingested between Phase 3 and the rollback was only written to v2 and won’t be visible through the v1 read path.

Helm values reference

The following Helm values control the v1/v2 storage configuration and migration behavior.

Storage layer toggles

Value	Type	Default	Description
`architecture.storage.v1`	bool	`true`	Enable v1 storage and its components (ingester, store-gateway, querier, compactor).
`architecture.storage.v2`	bool	`false`	Enable v2 storage and its components (segment-writer, metastore, compaction-worker, query-backend).

Migration tuning

These values only apply when both v1 and v2 are enabled (dual-ingest mode). All values are under architecture.storage.migration.

Value	Type	Default	Description
`<prefix>.ingesterWeight`	float	`1.0`	Fraction `[0, 1]` of write traffic sent to v1 ingesters.
`<prefix>.segmentWriterWeight`	float	`1.0`	Fraction `[0, 1]` of write traffic sent to v2 segment-writers.
`<prefix>.queryBackend`	bool	`true`	Enable the v2 query backend for reads.
`<prefix>.queryBackendFrom`	string	`"auto"`	RFC 3339 timestamp (e.g. `2025-01-01T00:00:00Z`) from which the v2 read path serves traffic. When set to `auto`, the query frontend consults the metastore per tenant to determine when v2 data first appeared. If no v2 data exists for a tenant, queries fall back to v1.