How Grafana Labs Effectively Pairs Loki and Kubernetes Events

Published: 21 Aug 2019 by Goutham Veeramachaneni RSS

As we’ve rolled out Loki internally at Grafana Labs, we wanted logs beyond just simple applications. Specifically while debugging outages due to config, Kubernetes, or node restarts, we’ve found Kubernetes events to be super useful.

How It Works

The Kubernetes events feature allows you to see all of the changes in a cluster, and you can get a simple overview by just retrieving them:

➜  ~ kc get events
LAST SEEN   TYPE 	REASON          	KIND     	MESSAGE
38m     	Normal   Killing         	Pod      	Killing container with id docker://grafana:Need to kill Pod
38m     	Normal   SuccessfulDelete	ReplicaSet   Deleted pod: grafana-54f599867-xqdw7
38m     	Normal   Scheduled       	Pod      	Successfully assigned default/grafana-5c6c645897-s4c2b to gke-ops-tools1-gke-u-ops-tools1-gke-u-14d4793c-6kc4
38m     	Normal   Pulling         	Pod      	pulling image "grafana/grafana-dev:master-d54851f8e21347da81a74b60bae0601d53184439"
38m     	Normal   Pulled          	Pod      	Successfully pulled image "grafana/grafana-dev:master-d54851f8e21347da81a74b60bae0601d53184439"
38m     	Normal   Created         	Pod      	Created container
38m     	Normal   Started         	Pod      	Started container
14m     	Normal   Killing         	Pod      	Killing container with id docker://grafana:Need to kill Pod
38m     	Normal   SuccessfulCreate	ReplicaSet   Created pod: grafana-5c6c645897-s4c2b
14m     	Normal   SuccessfulDelete	ReplicaSet   Deleted pod: grafana-5c6c645897-s4c2b
14m     	Normal   Scheduled       	Pod      	Successfully assigned default/grafana-844858cf5f-fqhn6 to gke-ops-tools1-gke-u-ops-tools1-gke-u-14d4793c-ks8l
14m     	Normal   Pulling         	Pod      	pulling image "grafana/grafana-dev:master-81c42fc912cba9c3e553d5ac433147a04638a045"
14m     	Normal   Pulled          	Pod      	Successfully pulled image "grafana/grafana-dev:master-81c42fc912cba9c3e553d5ac433147a04638a045"
14m     	Normal   Created         	Pod      	Created container
14m     	Normal   Started         	Pod      	Started container
14m     	Normal   SuccessfulCreate	ReplicaSet   Created pod: grafana-844858cf5f-fqhn6
38m     	Normal   ScalingReplicaSet   Deployment   Scaled up replica set grafana-5c6c645897 to 1
38m     	Normal   ScalingReplicaSet   Deployment   Scaled down replica set grafana-54f599867 to 0
14m     	Normal   ScalingReplicaSet   Deployment   Scaled up replica set grafana-844858cf5f to 1
14m     	Normal   ScalingReplicaSet   Deployment   Scaled down replica set grafana-5c6c645897 to 0

This also captures when nodes go unresponsive and when a pod has been killed along with the reason.

How Grafana Labs Pairs Loki and Kubernetes Events

Most recently, Kubernetes events proved to be effective in debugging our last outage:

15m 15m 1 ingester-6f9b57ccbd-rq9qs.15b2d20d55e14865 Pod Normal Preempted default-scheduler by <namespace>/querier-9467b8d85-7kwf5 on node gke-us-central1-us-central1-bigger-no-6dc155a4-jsqx

Persisting and being able to query the events is important, but unfortunately, Kubernetes only persists the events for one hour to reduce the load on etcd. Loki, however, is a good fit to store and query the events.

I started exploring different ways Grafana Labs could get the events into Loki, including adding a source to Promtail itself. Luckily, I found that Heptio, which was acquired by VMWare in 2018, had already built eventrouter for this exact use case – extracting events from Kubernetes and sending them to a third-party service.

One of the good things about eventrouter is that it’s pluggable, and one can write a Loki sink. But it’s also possible to write the events out stdout JSON and use Promtail to scrape them, which is the route I went for.

After deploying eventrouter, seeing the events in Kubernetes was very simple with the query {name=’eventrouter’}. But then I started to notice that there were so many events and decided that we should be able to select or sort events based on the namespace at the very least.

To do so, I leveraged Promtail’s pipeline configuration to add namespace as an additional label to the logs exported to Loki:

- match:
  selector: '{name="eventrouter"}'
  stages:
  - json:
      expressions:
        namespace: event.metadata.namespace
  - labels:
    namespace: ""

This would take the namespace from the event’s JSONPath and add it as a label.

As a result, I can query all the events from just the grafana-com namespace:

What’s Next

While we’re quite happy that we can now store and retrieve events, we’ve found the UI is quite lacking when dealing with JSON logs. I personally find the entire JSON blob is just distracting when we’re looking for the values to a few keys, so we’re looking at ways we can improve handling JSON logs.

More About Loki

Launched at KubeCon North America last December, Loki is a Prometheus-inspired service that optimizes storage, search, and aggregation while making logs easy to explore natively in Grafana. Loki is designed to work easily both as microservices and as monoliths, and correlates logs and metrics to save users money.

Less than a year later, Loki has almost 6,500 stars on GitHub and is now quickly approaching GA.

At Grafana Labs, we’ve been working hard on developing key features to make that possible, including loki-canary early detection for missing logs, the Docker logging driver plugin and support for systemd, and adding structure to unstructured logs with the pipeline stage.

You can also read about query optimization in Loki in our three-part series which covers topics such as the use of Go, iterators, as well as ingestion retention and label queries.

Be sure to check back for more content about Loki.