Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Migrate from loki-distributed
Helm chart
This guide will walk you through migrating to the loki
Helm Chart, v3.0 or higher, from the loki-distributed
Helm Chart (v0.63.2 at time of writing). The process consists of deploying the new loki
Helm Chart alongside the existing loki-distributed
installation. By joining the new cluster to the exsiting cluster’s ring, you will create one large cluster. This will allow you to manually bring down the loki-distributed
components in a safe way to avoid any data loss.
Before you begin:
We recommend having a Grafana instance available to monitor both the existing and new clusters, to make sure there is no data loss during the migration process. The loki
chart ships with self-monitoring features, including dashboards. These are useful for monitoring the health of the new cluster as it spins up.
Start by updating your existing Grafana Agent or Promtail config (whatever is scraping logs from your environment) to exclude the new deployment. The new loki
chart ships with its own self-monitoring mechanisms, and we want to make sure it’s not scraped twice, which would produce duplicate logs. The best way to do this is via a relabel config that will drop logs from the new deployment, for example something like:
- source_labels:
- "__meta_kubernetes_pod_label_app_kubernetes_io_component"
regex: "(canary|read|write)"
action: "drop"
This leverages the fact that the new deployment adds a app.kubernetes.io/component
label of either read
for the Read pods, write
for the Write pods, and canary
for the Loki Canary pods. These annotations are not present in the loki-distributed
deployment, so this should only match logs from the new deployment.
To Migrate from loki-distributed
to loki
Deploy new Loki Cluster
Next you will deploy the new
loki
chart in the same namespace as your existingloki-distributed
installation. Make sure to use the same buckets and storage configuration as your existing installation. You will need to set some specialmigrate
values as well:migrate: fromDistributed: enabled: true memberlistService: loki-loki-distributed-memberlist
The value of
memberlistService
must be the kubernetes service created for the purpose of Memberlist DNS in theloki-distributed
deployment. It should match the value ofmemberlist.join_members
in the config of theloki-distributed
deployment. This is what will make the new cluster join the existing clusters ring. It is important to join all existing rings, if you are using different memberlist DNS for different rings, you must manually set the value of each applicablejoin_members
configuration for each ring. If using the same memberlist DNS for all rings, as is the default in theloki-distributed
chart, settingmigrate.memberlistService
should be enough.Once the new cluster is up, add the appropriate data source in Grafana for the new cluster. Check that the following queries return results:
- Confirm new and old logs are in the new deployment. Using the new deployment’s Loki data source in Grafana, look for:
- Logs with a job that is unqiue to your existing Promtail or Grafana Agent, the one we adjusted above to exclude logs from the new deployment which is not yet pushing logs to the new deployment. If you can query those via the new deployment in shows we have not lost historical logs.
- Logs with the label
job="loki/loki-read"
. The read component does not exist inloki-distributed
, so this show the new Loki cluster’s self monitoring is working correctly.
- Confirm new logs are in the old deployment. Using the old deployment’s Loki data source in Grafana, look for:
- Logs with the label
job="loki/loki-read"
. Since you have excluded logs from the new deployment from going to theloki-distributed
deployment, if you can query them through theloki-distributed
Loki data source that show the ingesters have joined the same ring, and are queryable from theloki-distributed
queriers.
- Logs with the label
- Confirm new and old logs are in the new deployment. Using the new deployment’s Loki data source in Grafana, look for:
Convert all Clients to Push Logs to New
loki
DeploymentAssuming everything is working as expected, you can now modify the
clients
section of your Grafana Agent or Promtail configuration to push logs to the new deployment. After this change is made, theloki-distributed
cluster will no longer recieve new logs and can be carefully scaled down.Once this has deployed, query the new
loki
cluster’s Loki data source for new logs to make sure they’re still being ingested.Tear Down the Old Loki Canary
If you had previously deployed the canary via the
loki-canary
Helm Chart, you can now tear it down. The new chart ships the canary by default and is automatically configured to scrape it.Update Flush Config On
loki-distributed
DeploymentYou are almost ready to start scaling down the old
loki-distributed
cluster. Before you start, however, it is important to make sure theloki-distributed
ingesters are configured to flush on shutdown. Since these ingesters will not be coming back, there will be no opportunity for them to replay their WALs, so they need to flush all in-memory logs before shutting down to prevent data loss.The easiest way to do this is via the
extraArgs
argument in theingester
section of theloki-distributed
chart. You may also want to set the ingester’s log level todebug
to see messages about the flushing process.ingester: replicas: 3 extraArgs: - '-ingester.flush-on-shutdown=true' - '-log.level=debug' ``` Deploy this change, and make sure all ingesters restart and are running the latest configuration.
Scale Down Ingesters One at a Time
It is now time to start scaling down
loki-distributed
. Scale down the ingester StatefulSet or Deployment (depending on how yourloki-distributed
chart is deployed) 1 replica at a time. Ifdebug
logs were enabled, you can monitor the logs of each ingester as it’s terminating to make sure the flushing process was successful. Once the ingester pod is fully terminated, continue decrementing by another 1 replica. Continue until there are 0 instances of the ingester running.You can use the following command to edit the ingester StatefulSet in order to change the number of replicas (making sure to replace <NAMESPACE> with the correct namespace):
kubectl -n <NAMESPACE> edit statefulsets.apps loki-distributed-ingester
Remove
loki-distributed
clusterDouble check that logs are still coming in to the new cluster. If something is wrong, it will be much easier to quickly scale back up
loki-distributed
ingesters before tearing down the whole cluster so you can investigate. If everything looks good, you can tear downloki-distributed
usinghelm uninstall
. For example:helm uninstall -n loki loki-distributed
Now edit the new
loki
cluster to remove themigrate
options you added when first installing. Remove all of the following from yourvalues.yaml
:migrate: fromDistributed: enabled: true memberlistService: loki-loki-distributed-memberlist
To apply the new configuration (assuming you installed the new chart as
loki
in the loki namespace):helm upgrade -n loki loki grafana/loki --values values.yaml
The
migrate.fromDistributed.memberlistService
was added as an additional memberlist join member, so once this new config is pushed, the components should roll without interruption.Check the Dashboards
Now that the migration is finished, you can check the dashboards included with the new
loki
chart to make sure everything is working as expected. You can also check the loki canary metrics to make sure nothing was lost during the migration. Assuming everything was in theloki
namespace, the following query, if run over a time period that starts before the migration, and ends after it was complete, should clearly illustrate when metrics started coming from the new canary, and if and when there was data detected by either during the process:( sum(increase(loki_canary_missing_entries_total{namespace="loki"}[$__range])) by (job) / sum(increase(loki_canary_entries_total{namespace="loki"}[$__range])) by (job) )*100