Migrate from SSD to distributed
This guide provides instructions for migrating from a simple scalable deployment (SSD) to a distributed microservices deployment of Loki. Before starting the migration, make sure you have read the considerations section.
Note
In this guide, an AWS deployment is used as an example. However, the migration process is mirrored for other cloud providers. This is due to the fact that no changes are required to the underlying data storage.
Considerations
Migrating from a simple scalable deployment to a distributed deployment with zero downtime is possible but requires careful planning. The following considerations should be taken into account:
- Helm Deployment: This guide assumes that you have deploying Loki using Helm. Other migration methods are possible but are not covered in this guide.
- Kubernetes Resources: This migration method requires you to spin up distributed Loki pods before shutting down the SSD pods. This means that you need to have enough resources in your Kubernetes cluster to run both the SSD and distributed Loki pods at the same time.
- Data: No changes are required to your underlying data storage. Although data loss or corruption is unlikely, it is always recommended to back up your data before starting the migration process. If you are using a cloud provider you can take a snapshot/backup.
- Configuration: We do not account for all configuration parameters in this guide. We only cover the parameters that need to be changed. Other parameters can remain the same. However, if
pattern_ingesters=true
you will need to spin uppatternIngesters
before shutting down the SSD ingesters. This is primarily needed for the Grafana Logs Drilldown feature. - Zone Aware Ingesters: This guide does not currently account for Zone Aware Ingesters. Our current recommendation is to either disable Zone Aware Ingesters or to consult the Mimir migration guide. Take note, not all parameters are equivalent between Mimir and Loki.
Prerequisites
Before starting the migration process, make sure you have the following prerequisites:
- Access to your Kubernetes cluster via
kubectl
. - Helm installed.
Example SSD deployment
This example will use the following SSD deployment as a reference:
Note
This example is only a reference on the parameters that need to be changed. There will be other parameters within your own config such as
limits_config
,gateway
,compactor
, etc. These can remain the same.
---
loki:
schemaConfig:
configs:
- from: "2024-04-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
storage_config:
aws:
region: eu-central-1
bucketnames: aws-chunks-bucket
s3forcepathstyle: false
ingester:
chunk_encoding: snappy
ruler:
enable_api: true
storage:
type: s3
s3:
region: eu-central-1
bucketnames: aws-ruler-bucket
s3forcepathstyle: false
alertmanager_url: http://prom:9093
querier:
max_concurrent: 4
storage:
type: s3
bucketNames:
chunks: "aws-chunks-bucket"
ruler: "aws-ruler-bucket"
s3:
region: eu-central-1
deploymentMode: SimpleScalable
# SSD
backend:
replicas: 2
read:
replicas: 3
write:
replicas: 3
# Distributed Loki
ingester:
replicas: 0
zoneAwareReplication:
enabled: false
querier:
replicas: 0
maxUnavailable: 0
queryFrontend:
replicas: 0
maxUnavailable: 0
queryScheduler:
replicas: 0
distributor:
replicas: 0
maxUnavailable: 0
compactor:
replicas: 0
indexGateway:
replicas: 0
maxUnavailable: 0
ruler:
replicas: 0
maxUnavailable: 0
# Single binary Loki
singleBinary:
replicas: 0
minio:
enabled: false
Stage 1: Deploying the Loki distributed components
In this stage, we will deploy the distributed Loki components alongside the SSD components. We will also change the deploymentMode
to SimpleScalable<->Distributed
. The SimpleScalable<->Distributed
migration mode allows for a zero-downtime transition between Simple Scalable and fully distributed architectures. During migration, both deployment types run simultaneously, sharing the same object storage backend.
The following table outlines which components take over the responsibilities of the SSD components:
Simple Scalable Components | Distributed Components |
---|---|
write (Deployment) | Distributor + Ingester |
read (StatefulSet) | Query Frontend + Querier |
backend (StatefulSet) | Compactor + Ruler + Index Gateway |
How Loki handles request routing during the migration:
The Gateway (nginx) handles request routing based on endpoint type:
- Write Path (
loki/api/v1/push
):- Initially routes to Simple Scalable write component
- Gradually shifted to the Distributor
- Both write paths share the same object storage, ensuring data consistency
- Read Path (
/loki/api/v1/query
):- Routes to either Simple Scalable read or distributed Query Frontend
- Query results are consistent since both architectures read from same storage
- Admin/Background Operations:
- Compaction, retention, and rule evaluation handled by either backend or respective distributed components
- Operations are coordinated through object storage locks
To start the migration process:
Create a copy of your existing
values.yaml
file and name itvalues-migration.yaml
.cp values.yaml values-migration.yaml
Next modify the following parameters;
deploymentMode
,ingester
and components based on the annotations below.--- loki: schemaConfig: configs: - from: "2024-04-01" store: tsdb object_store: s3 schema: v13 index: prefix: loki_index_ period: 24h storage_config: aws: region: eu-central-1 bucketnames: aws-chunks-bucket s3forcepathstyle: false ingester: chunk_encoding: snappy # Add this to ingester; this will force ingesters to flush before shutting down wal: flush_on_shutdown: true ruler: enable_api: true storage: type: s3 s3: region: eu-central-1 bucketnames: aws-ruler-bucket s3forcepathstyle: false alertmanager_url: http://prom:9093 querier: max_concurrent: 4 storage: type: s3 bucketNames: chunks: "aws-chunks-bucket" ruler: "aws-ruler-bucket" s3: region: eu-central-1 # Important: Make sure to change this to SimpleScalable<->Distributed deploymentMode: SimpleScalable<->Distributed # SSD backend: replicas: 2 read: replicas: 3 write: replicas: 3 # Distributed Loki # Spin up the distributed components ingester: replicas: 3 zoneAwareReplication: enabled: false querier: replicas: 3 maxUnavailable: 0 queryFrontend: replicas: 2 maxUnavailable: 0 queryScheduler: replicas: 2 distributor: replicas: 2 maxUnavailable: 0 compactor: replicas: 1 indexGateway: replicas: 2 maxUnavailable: 0 ruler: replicas: 1 maxUnavailable: 0 # Single binary Loki singleBinary: replicas: 0 minio: enabled: false
Here is a breakdown of the changes:
ingester.wal.flush_on_shutdown: true
: This will force the ingesters to flush before shutting down. This is important to prevent data loss.deploymentMode: SimpleScalable<->Distributed
: This will allow for the SSD and distributed components to run simultaneously.- Spin up all distributed components with the desired replicas.
Deploy the distributed components using the following command:
helm upgrade --values values-migration.yaml loki grafana/loki -n loki
Caution
It is important to allow all components to fully spin up before proceeding to the next stage. You can check the status of the components using the following command:
kubectl get pods -n loki
Let all components reach the
Running
state before proceeding to the next stage.
Stage 2: Transitioning to distributed components
The final stage of the migration involves transitioning all traffic to the distributed components. This is done by scaling down the SSD components and swapping the deploymentMode
to Distributed
. To do this:
Create a copy of
values-migration.yaml
and name itvalues-distributed.yaml
.cp values-migration.yaml values-distributed.yaml
Next modify the following parameters;
deploymentMode
and components based on the annotations below.--- loki: schemaConfig: configs: - from: "2024-04-01" store: tsdb object_store: s3 schema: v13 index: prefix: loki_index_ period: 24h storage_config: aws: region: eu-central-1 bucketnames: aws-chunks-bucket s3forcepathstyle: false ingester: chunk_encoding: snappy wal: flush_on_shutdown: true ruler: enable_api: true storage: type: s3 s3: region: eu-central-1 bucketnames: aws-ruler-bucket s3forcepathstyle: false alertmanager_url: http://prom:9093 querier: max_concurrent: 4 storage: type: s3 bucketNames: chunks: "aws-chunks-bucket" ruler: "aws-ruler-bucket" s3: region: eu-central-1 # Important: Make sure to change this to Distributed deploymentMode: Distributed # SSD # Scale down the SSD components backend: replicas: 0 read: replicas: 0 write: replicas: 0 # Distributed Loki ingester: replicas: 3 zoneAwareReplication: enabled: false querier: replicas: 3 maxUnavailable: 0 queryFrontend: replicas: 2 maxUnavailable: 0 queryScheduler: replicas: 2 distributor: replicas: 2 maxUnavailable: 0 compactor: replicas: 1 indexGateway: replicas: 2 maxUnavailable: 0 ruler: replicas: 1 maxUnavailable: 0 # Single binary Loki singleBinary: replicas: 0 minio: enabled: false
Here is a breakdown of the changes:
deploymentMode: Distributed
: This will allow for the distributed components to run in isolation.- Scale down all SSD components to
0
.
Deploy the final configuration using the following command:
helm upgrade --values values-distributed.yaml loki grafana/loki -n loki
Once the deployment is complete, you can verify that all components are running using the following command:
kubectl get pods -n loki
You should see all distributed components running and the SSD compontents have now been removed.
What’s next?
Loki in distributed mode is inherently more complex than SSD mode. It is recommended to meta-monitor your Loki deployment to ensure that everything is running smoothly. You can do this by following the meta-monitoring guide.