How to collect and query Kubernetes logs with Grafana Loki, Grafana, and Grafana Agent

Damaso Sanoja

•

2023-04-13•15 min

Logging in Kubernetes can help you track the health of your cluster and its applications. Logs can be used to identify and debug any issues that occur. Logging can also be used to gain insights into application and system performance. Moreover, collecting and analyzing application and cluster logs can help identify bottlenecks and optimize your deployment for better performance.

That said, Kubernetes lacks a native solution for managing and analyzing logs. Instead, Kubernetes sends the logs provided by the container engine to standard streams. Fortunately, you can integrate a separate backend to store, analyze, and query logs. For example, you can use a centralized logging solution, like Grafana Loki, to collect and aggregate all of the logs from your Kubernetes cluster in one place.

A centralized logging solution allows you to do the following:

Audit actions taken on your cluster
Troubleshoot problems with your applications and clusters
Gain insights for your performance monitoring strategy

This tutorial first takes a closer look at what Kubernetes logging is. Then we’ll walk you through the following processes:

Implementing Loki, Grafana, and the Grafana Agent
Querying and viewing Kubernetes log data using Grafana
Using metric queries to create metrics from logs
Setting up alerts from logs

What is Kubernetes logging?

In simple terms, Kubernetes logging architecture allows DevOps teams to collect and view data about what is going on inside the applications and services running on the cluster. However, Kubernetes logging is different from logging on traditional servers and virtual machines in a few ways.

The logs that Kubernetes collects are not provided directly by the application but by the container engine or runtime. These logs are captured by kubelets, which are agents installed on each node. By default, the kubelet agent keeps logs on the node even after pod termination. This makes it crucial to have a log rotation mechanism that prevents node resources from being overloaded.
You can use the command kubectl log to view logs from pods. However, these logs only show you a picture taken at the time of executing the command. Actually, what you need is a backend that allows you to collect logs and perform in-depth data analysis later on.
Kubernetes handles different types of logs, namely application logs, control plane logs, and events. The latter is actually a Kubernetes object that’s intended to provide insights into what’s happening inside nodes and pods and is usually accompanied by logs that expand the output message of the event.

Desipte the differences between traditional logs and Kubernetes logs, it’s still important to set up a logging solution to optimize the performance of your Kubernetes fleet. But before start setting up Grafana Loki in your cluster it’s necessary to review the different deployment modes that Loki, the log-aggregator component, offers. That way, you’ll understand the pros and cons of each.

Loki deployment modes

Loki has three different deployment modes to choose from. Basically, these modes give DevOps teams control over how Loki scales.

Monolithic mode: As the name suggests, in this mode, all Loki components — the distributor, ingestor, ruler, querier, and query frontend — run as a single binary or single process inside a container. The advantages of this approach is its simplicity in both deployment and configuration, which makes it ideal for those who want to try out Loki. This mode is also appropriate for use cases with read/write volumes not exceeding 100GB per day.
Microservices mode: This mode allows you to manage each Loki component as an independent target, giving your team maximum control over their scalability. As expected, this approach allows Loki to process huge amounts of logs, which makes this mode ideal for large-scale projects. However, the greater log processing capacity comes at the price of increased complexity.
Scalable mode: This mode allows Loki to scale to several TBs of logs per day. This is possible by separating read and write concerns into independent microservices, which makes it easy to scale the deployment as needed. Furthermore, separating read and write targets improves Loki’s performance to levels that are difficult to achieve with a monolithic solution. However, you should note that this mode requires a load balancer or gateway to manage the traffic between read and write nodes. As you can imagine, the simple scalable deployment mode is ideal for use cases requiring a logging solution that handles large volumes of logs daily.

You’ll use the scalable mode in this tutorial as it will allow you to see how easy it is to implement Loki to handle a large volume of logs.

Now let’s move on to how to actually query Kubernetes logs with Grafana Loki, Grafana, and Grafana Agent.

Setting up logging in Kubernetes with the Grafana LGTM Stack

The Grafana LGTM Stack (Loki for logs, Grafana for visualizations, tempo for traces, and Mimir for metrics) is a comprehensive open source observability ecosystem. In this tutorial, you’ll use the following components:

Grafana Loki: Loki is a horizontally scalable, highly available, multitenant log aggregation system inspired by Prometheus. For this reason, you can think of Loki as the heart of this deployment since it will be responsible for handling the logs.
Grafana: Grafana is the leading open source visualization tool. This tutorial will use Grafana dashboards to visualize logs, run queries, and set alerts.
Grafana Agent: You can think of the Grafana Agent as a telemetry collector for the LGTM stack. Although you’ll use it to collect logs, the Grafana Agent is also known for its versatility in collecting metrics and traces of applications and services.

Prerequisites

To complete this step-by-step tutorial, you’ll need the following:

A Kubernetes cluster with a control plane and at least two worker nodes to see how Loki works in a scalable mode
A local machine with kubectl and Helm installed and configured

Setting up Loki and the Grafana Agent with Helm

From your local machine, add the Grafana Helm chart using the following command:

helm repo add grafana https://grafana.github.io/helm-charts
"grafana" has been added to your repositories

Update Helm charts using the following code:

helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "grafana" chart repository
Update Complete. ⎈Happy Helming!⎈

Before continuing, take a moment to review the charts available for Loki:

helm search repo loki
NAME                            CHART VERSION   APP VERSION     DESCRIPTION
grafana/loki                    4.7.0           2.7.3           Helm chart for Grafana Loki in simple, scalable...
grafana/loki-canary             0.10.0          2.6.1           Helm chart for Grafana Loki Canary
grafana/loki-distributed        0.69.6          2.7.3           Helm chart for Grafana Loki in microservices mode
grafana/loki-simple-scalable    1.8.11          2.6.1           Helm chart for Grafana Loki in simple, scalable...
grafana/loki-stack              2.9.9           v2.6.1          Loki: like Prometheus, but for logs.
grafana/fluent-bit              2.3.2           v2.1.0          Uses fluent-bit Loki go plugin for gathering lo...
grafana/promtail                6.9.0           2.7.3           Promtail is an agent which ships the contents o...

As you can see, you have several alternatives to choose from. In this case, the best choice is grafana/loki. (grafana/loki-simple-scalable has been deprecated.)

You’ll now install the grafana/loki chart. It’s good practice to create a separate environment for your logging stack as this gives you more control over permissions and helps you keep resource consumption in check. Create a new namespace called loki:

kubectl create ns loki

Loki requires an S3-compatible object storage solution to store Kubernetes cluster logs. For simplicity, you’ll use MinIO in this tutorial. To enable it, you’ll need to edit Loki’s Helm Chart values file.

Create a file named values.yml and paste the following config in the file:

minio:
  enabled: true

Save the chart and install grafana/loki in the loki namespace using the following command:

helm upgrade --install --namespace loki logging grafana/loki -f values.yml --set loki.auth_enabled=false

The output should be similar to this:

Release "logging" does not exist. Installing it now.
NAME: logging
LAST DEPLOYED: Mon Feb 27 09:22:00 2023
NAMESPACE: loki
STATUS: deployed
REVISION: 1
NOTES:
***********************************************************************
 Welcome to Grafana Loki
 Chart version: 4.4.0
 Loki version: 2.7.0
***********************************************************************

Installed components:
* grafana-agent-operator
* gateway
* minio
* read
* write
* backend

The command installs Loki, the Grafana Agent Operator, and MinIO from the local chart you modified. To simplify this tutorial, the flag --set loki.auth_enabled=false was used to avoid using a proxy for Grafana Loki authentication.

With Loki and the Grafana Agent installed, the next step is setting up Grafana’s frontend.

Setting up Grafana with Helm

Grafana is a powerful platform to create queries, set up alerts, view logs and metrics, and more. Since you’ve already added the corresponding Helm chart, you can install Grafana in the loki namespace using the following command:

helm upgrade --install --namespace=loki loki-grafana grafana/grafana

To verify the installation of the logging stack components (Loki, Agent, and Grafana), you can use the following command:

kubectl get all -n loki

The output should be similar to the following:

NAME                                                READY   STATUS  RESTARTS    AGE
pod/logging-minio-0                                 1/1     Running   1 (155m ago)   40h
pod/logging-grafana-agent-operator-5c8d7bbfd9-g7m2c   1/1   Running   1 (155m ago)   40h
pod/loki-grafana-79bc49978f-9cffd                   1/1     Running   1 (155m ago)   40h
pod/logging-loki-logs-5d9xg                         2/2     Running   2 (155m ago)   40h
pod/logging-loki-logs-97786                         2/2     Running   2 (155m ago)   40h
pod/logging-loki-logs-8n22r                         2/2     Running   2 (155m ago)   40h
pod/loki-gateway-5c778d6557-jfxjj                   1/1     Running   0             18m
pod/loki-canary-xpp6v                               1/1     Running   0             18m
pod/loki-read-575457cc44-85kdb                      1/1     Running   0             18m
pod/loki-read-575457cc44-lnbh8                      1/1     Running   0             18m
pod/loki-backend-1                                  1/1     Running   0             18m
pod/loki-backend-0                                  1/1     Running   0             18m
pod/loki-read-575457cc44-jldvt                      1/1     Running   0             18m
pod/loki-backend-2                                  1/1     Running   0             18m
pod/loki-write-2                                    1/1     Running   0             18m
pod/loki-canary-wv5x5                               1/1     Running   0             16m
pod/loki-canary-qjjgn                               1/1     Running   0             16m
pod/loki-write-1                                    1/1     Running   0             16m
pod/loki-write-0                                    1/1     Running   0             15m

NAME                            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
service/loki-write-headless     ClusterIP   None                  3100/TCP,9095/TCP   40h
service/loki-read-headless      ClusterIP   None                  3100/TCP,9095/TCP   40h
service/loki-grafana            ClusterIP   10.43.68.47           80/TCP              40h
service/logging-minio-console   ClusterIP   10.43.86.206          9001/TCP            40h
service/logging-minio           ClusterIP   10.43.94.154          9000/TCP            40h
service/logging-minio-svc       ClusterIP   None                  9000/TCP            40h
service/loki-backend-headless   ClusterIP   None                  3100/TCP,9095/TCP   18m
service/loki-backend            ClusterIP   10.43.114.191         3100/TCP,9095/TCP   18m
service/loki-gateway            ClusterIP   10.43.138.230         80/TCP              40h
service/loki-canary             ClusterIP   10.43.85.250          3500/TCP            40h
service/loki-read               ClusterIP   10.43.69.223          3100/TCP,9095/TCP   40h
service/loki-memberlist         ClusterIP   None                  7946/TCP            40h
service/loki-write              ClusterIP   10.43.39.76           3100/TCP,9095/TCP   40h

NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/logging-loki-logs   3        3       3       3           3                     40h
daemonset.apps/loki-canary      3       3       3       3           3                     40h

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/logging-grafana-agent-operator   1/1    1           1           40h
deployment.apps/loki-grafana                    1/1     1           1           40h
deployment.apps/loki-gateway                    1/1     1           1           40h
deployment.apps/loki-read                       3/3     3           3           18m

NAME                                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/logging-grafana-agent-operator-5c8d7bbfd9   1       1       1       40h
replicaset.apps/loki-grafana-79bc49978f                     1       1       1       40h
replicaset.apps/loki-gateway-5c778d6557                     1       1       1       18m
replicaset.apps/loki-gateway-5d69c5f585                     0       0       0       40h
replicaset.apps/loki-read-575457cc44                        3       3       3       18m

NAME                            READY   AGE
statefulset.apps/logging-minio   1/1    40h
statefulset.apps/loki-backend   3/3     18m
statefulset.apps/loki-write     3/3     40h

Now that all of your logging stack components are in place, we’ll configure Loki as the Grafana data source, which is necessary to visualize Kubernetes logs.

Configuring a data source in Grafana

To access the Grafana UI, you first need the access token created during installation:

kubectl get secret --namespace loki loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Copy the token to a safe place as you’ll need it shortly. Next, allow Grafana traffic to be forwarded to your local machine using the following command:

kubectl port-forward --namespace loki service/loki-grafana 3000:80

Now in your browser navigate to the address http://localhost:3000 and enter admin as the username and the access token as the password.

Grafana dashboard login

After accessing the main page, click Data Sources.

Grafana main page

Scroll down until you find Loki and click it to add it as a data source.

Add data source

The next screen is very important. You’ll need to enter the URL that Grafana will use to access the Loki backend. This URL must be a fully qualified domain name (FQDN) following the format <service-name>.<namespace-of-the-service>.svc.cluster.local.

For this tutorial, that URL would be http://loki-gateway.loki.svc.cluster.local. Once the URL has been entered, you can leave the other fields as they are and press the Save and test button. You’ll know that everything is working as expected if you see the message “Data source connected and labels found” as shown below:

Data sources / Loki screen

With Loki configured as Grafana’s data source, you’re ready to manage your logs from the UI.

Managing Logs in Grafana

In this section, you’ll learn how to use Grafana Loki to query logs, create metrics from those logs, and finally set up alerts based on metrics.

How to query logs in Grafana

In Loki, you use LogQL, a query language inspired by PromQL, to select a specific log stream using Kubernetes labels and label selectors. Furthermore, LogQL allows you to perform two types of queries: log queries and metric queries.

Let’s start by performing log queries. To do this, go to the Grafana Explore view by clicking the compass icon to your left. By default, the Query Builder is selected. The query builder is intended to make it easier for users who are not familiar with LogQL syntax to build queries. However, in this tutorial, you’ll use LogQL, so click the Code tab as shown below.

Grafana explore screen

You’ll use {label="label selector"} for this tutorial, which is one of the simpler LogQL expressions. To test it, you’ll query the log output of the standard error stream stderr as it will make it easier to create metrics and alerts later.

To do that, paste the following LogQL code in Label browser box:

{stream="stderr"} |= ``

Ignore the regular expression |= for a moment; you’ll move on to that shortly. Instead, press the Run query button located in the upper right corner. The output should be similar to the following:

Query logs output

At the top, the volume of the logs are shown, and at the bottom, you can see the actual logs, the equivalent of using the kubectl logs command. Also, at the top, there is a help dialog box explaining the LogQL code used.

The screen above shows only a fraction of what you can achieve in Grafana Explore. You can add more queries, view query history, customize the view, and more.

Something else you can do from Grafana Explore is to perform metric queries. This type of query allows you to visualize your logs in graphs, something useful to detect patterns more easily.

How to perform metric queries in Grafana

Let’s start by reviewing the code used in the previous section:

{stream="stderr"} |= ``

You’ll remember that you ignored the regular expression |=. In LogQL, you can pipe regular expressions to filter logs. For example, you could use the following query to get only logs that contain level=error from the stderr stream:

{stream="stderr"} |= `level=error`

The ability to pipe queries and regular expressions allow you to create complex queries; however, they are still log queries. To get metric queries, you need to use a function that quantifies the occurrence of a certain condition.

Consider the following code:

count_over_time({stream="stderr"} |= `level=error` [5m])

As you might expect, the purpose of count_over_time is to quantify the occurrence of our log query in a certain amount of time — in this example, every five minutes.

Give it a try, and enter the code shown above in the Label browser box and press the Run query button. The result should be similar to the following:

Metric query

At the top, the LogQL code is briefly explained, and the logs are shown as a line graph at the bottom. This example shows three lines corresponding to errors in the three loki-read pods.

As before, you can customize this view to your liking. You can also add this panel to a dashboard by pressing the Add to dashboard button, as shown below:

Add to dashboard

In the image below, an example of a dashboard showing error metrics from a logging stream is shown:

New panel

As you can see, Grafana Loki can help your team have better observability for your cluster’s logs. But what if an issues arises?

In the final part of this tutorial, we’ll show you how to set up alerts from logs to add another level of insight into the health and performance of your fleet.

How to set up alerts from logs

One of the most useful features of Grafana is its powerful alerting system. To set up an alert, click the bell icon on your left to access the Alerting view.

Grafana Alerting View

Next, click the New alert rule button to take you to a screen similar to Explore, as shown below. You’ll use the same LogQL query as in the previous section to set up alerts. A brief explanation of the code will appear again as well as the graph that was derived from the metric query. However, you’ll notice that a red horizontal line is also displayed. This line corresponds to the threshold that triggers the alert.

Grafana alert rules

The screen below shows the result of setting the threshold to 400, which in this example only triggers an alert corresponding to the pod represented by the blue line.

Grafana alert trigger

If you scroll down, you’ll see that you have many other options to configure your alert, including notifications.

Alert notifications

Conclusion

Kubernetes has transformed cloud native development; however, one shortcoming of the system is that it does not provide a native storage solution for logs.

This tutorial reviewed how to implement an open source logging solution for Kubernetes with Grafana Loki in a single scalable deployment mode. It also explored how to use Grafana to not only visualize your logs, but also expand your understanding of your logging data by creating queries, correlating metrics, and establishing alerts based on logs.

If you’re interested in a fully managed logging solution, check out Kubernetes Monitoring in Grafana Cloud, which is available to all Grafana Cloud users, including those in our generous free tier. If you don’t already have a Grafana Cloud account, you can sign up for a free account today!

How to collect and query Kubernetes logs with Grafana Loki, Grafana, and Grafana Agent

What is Kubernetes logging?

Loki deployment modes

Setting up logging in Kubernetes with the Grafana LGTM Stack

Prerequisites

Setting up Loki and the Grafana Agent with Helm

Setting up Grafana with Helm

Configuring a data source in Grafana

Managing Logs in Grafana

How to query logs in Grafana

How to perform metric queries in Grafana

How to set up alerts from logs

Conclusion

Up next

Related content

Related videos

Related docs

Related products

How to collect and query Kubernetes logs with Grafana Loki, Grafana, and Grafana Agent

What is Kubernetes logging?

Loki deployment modes

Setting up logging in Kubernetes with the Grafana LGTM Stack

Prerequisites

Setting up Loki and the Grafana Agent with Helm

Setting up Grafana with Helm

Configuring a data source in Grafana

Managing Logs in Grafana

How to query logs in Grafana

How to perform metric queries in Grafana

How to set up alerts from logs

Conclusion

Related Content

Up next

Related content

Related videos

Related docs

Related products