This is documentation for the next version of Grafana Pyroscope documentation. For the latest stable release, go to the latest version.
Metadata index
The metadata index stores information about all data objects (blocks and segments) in object storage. It is maintained by the metastore service and provides fast lookups for query planning.
Purpose
The metadata index enables:
- Block discovery: Finding blocks that match a query’s time range and filters
- Query planning: Identifying exactly which objects need to be read
- Compaction coordination: Tracking which blocks can be compacted together
- Retention enforcement: Managing block lifecycle and cleanup
Implementation
The index is implemented using:
- BoltDB: Key-value store for metadata entries
- Raft: Consensus protocol for replication and consistency
BoltDB was chosen for its simplicity and efficiency with a single writer and concurrent readers. For better performance, the index can be stored on an in-memory volume since it’s recovered from the Raft log on startup.
Block metadata
Each block in object storage has a corresponding metadata entry containing:
- Block ID: Unique identifier (ULID) based on creation time
- Tenant: The tenant that owns the data
- Shard: The shard assignment for data distribution
- Time range: Start and end timestamps of the data
- Datasets: Information about contained datasets (services)
Dataset information
Each dataset within a block includes:
- Service name: The
service_namelabel identifying the application - Labels: Additional metadata labels for filtering
- Table of contents: Offsets to data sections within the dataset
Index structure
The index is partitioned by time, with each partition covering a 6-hour window:
Partition (6h window)
├── Tenant A
│ ├── Shard 0
│ ├── Shard 1
│ └── Shard N
└── Tenant B
├── Shard 0
└── Shard NWithin each shard:
- Block entries: Key-value pairs (block ID → metadata)
- String table: Deduplicated strings for space efficiency
- Shard index: Time range for efficient filtering
Index writes
Index writes are performed by segment-writers when new segments are created:
sequenceDiagram
participant SW as segment-writer
participant M as Metastore
participant R as Raft
participant I as Index
SW->>M: AddBlock(metadata)
M->>R: Propose ADD_BLOCK
R->>R: Commit to log
R->>I: Insert block
I-->>R: Success
R-->>M: Committed
M-->>SW: Success
Tombstone protection
Before adding a block, the index checks for tombstones to prevent re-adding blocks that were already compacted. This handles cases where:
- A writer’s response was lost but the block was added
- The block was already compacted before the retry
Index queries
Queries use the linearizable read pattern to ensure consistency:
- Read index request: Query asks for the current commit index.
- Leader check: Verifies the current leader.
- Wait for commit: Waits until the commit index is applied locally.
- Read state: Reads from the local state machine.
This allows both leader and follower replicas to serve queries while ensuring they see the latest committed state.
Query types
The index supports two main query patterns:
Metadata queries: Find blocks matching criteria
Query:
- Time range: [start, end]
- Tenant: ["tenant-1"]
- Labels: {service_name="frontend"}Label queries: List available labels without reading data
Query:
- Return: distinct values for "profile_type" label
- Filter: {service_name="frontend"}Retention
Compaction-based retention
When blocks are compacted:
- Source block entries are replaced with compacted block entry.
- Tombstones are created for source blocks.
- Tombstones trigger eventual deletion of source objects.
Time-based retention
Retention policies delete entire partitions based on:
- Block creation time: Primary retention criteria
- Data timestamps: Blocks are only deleted if data is also past retention
Retention policies are tenant-specific and configurable per tenant.
Cleanup process
The cleaner runs on the Raft leader and:
- Lists partitions and applies retention policy.
- Identifies partitions to delete.
- Proposes deletion to Raft.
- Creates tombstones for affected blocks.
- Tombstones are processed during compaction.
sequenceDiagram
participant C as Cleaner
participant M as Metastore
participant R as Raft
participant I as Index
C->>M: TruncateIndex(policy)
M->>I: List partitions
I-->>M: Partition list
M->>M: Apply retention policy
M->>R: Propose TRUNCATE_INDEX
R->>I: Delete partitions
R->>I: Add tombstones
R-->>M: Committed
M-->>C: Success
Performance
Caching
The index uses several caches:
- Shard cache: Keeps shard indexes and string tables in memory
- Block cache: Stores decoded metadata entries
Scalability
- Storage requirements: A few gigabytes even at large scale
- Query performance: Sub-millisecond lookups with caching
- Write throughput: Limited by Raft consensus, typically sufficient for ingestion rates
Implementation details
For detailed implementation information, including the protobuf schema and internal structures, refer to the internal documentation.


