Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

Grot cannot remember your choice unless you click the consent notice at the bottom.

How Grafana Labs Effectively Pairs Loki and Kubernetes Events

How Grafana Labs Effectively Pairs Loki and Kubernetes Events

2019-08-21 4 min

As we’ve rolled out Loki internally at Grafana Labs, we wanted logs beyond just simple applications. Specifically while debugging outages due to config, Kubernetes, or node restarts, we’ve found Kubernetes events to be super useful.

How It Works

The Kubernetes events feature allows you to see all of the changes in a cluster, and you can get a simple overview by just retrieving them:

➜  ~ kc get events
LAST SEEN   TYPE  REASON            KIND      MESSAGE
38m       Normal   Killing          Pod       Killing container with id docker://grafana:Need to kill Pod
38m       Normal   SuccessfulDelete ReplicaSet   Deleted pod: grafana-54f599867-xqdw7
38m       Normal   Scheduled        Pod       Successfully assigned default/grafana-5c6c645897-s4c2b to gke-ops-tools1-gke-u-ops-tools1-gke-u-14d4793c-6kc4
38m       Normal   Pulling          Pod       pulling image "grafana/grafana-dev:master-d54851f8e21347da81a74b60bae0601d53184439"
38m       Normal   Pulled           Pod       Successfully pulled image "grafana/grafana-dev:master-d54851f8e21347da81a74b60bae0601d53184439"
38m       Normal   Created          Pod       Created container
38m       Normal   Started          Pod       Started container
14m       Normal   Killing          Pod       Killing container with id docker://grafana:Need to kill Pod
38m       Normal   SuccessfulCreate ReplicaSet   Created pod: grafana-5c6c645897-s4c2b
14m       Normal   SuccessfulDelete ReplicaSet   Deleted pod: grafana-5c6c645897-s4c2b
14m       Normal   Scheduled        Pod       Successfully assigned default/grafana-844858cf5f-fqhn6 to gke-ops-tools1-gke-u-ops-tools1-gke-u-14d4793c-ks8l
14m       Normal   Pulling          Pod       pulling image "grafana/grafana-dev:master-81c42fc912cba9c3e553d5ac433147a04638a045"
14m       Normal   Pulled           Pod       Successfully pulled image "grafana/grafana-dev:master-81c42fc912cba9c3e553d5ac433147a04638a045"
14m       Normal   Created          Pod       Created container
14m       Normal   Started          Pod       Started container
14m       Normal   SuccessfulCreate ReplicaSet   Created pod: grafana-844858cf5f-fqhn6
38m       Normal   ScalingReplicaSet   Deployment   Scaled up replica set grafana-5c6c645897 to 1
38m       Normal   ScalingReplicaSet   Deployment   Scaled down replica set grafana-54f599867 to 0
14m       Normal   ScalingReplicaSet   Deployment   Scaled up replica set grafana-844858cf5f to 1
14m       Normal   ScalingReplicaSet   Deployment   Scaled down replica set grafana-5c6c645897 to 0

This also captures when nodes go unresponsive and when a pod has been killed along with the reason.

How Grafana Labs Pairs Loki and Kubernetes Events

Most recently, Kubernetes events proved to be effective in debugging our last outage:

15m 15m 1 ingester-6f9b57ccbd-rq9qs.15b2d20d55e14865 Pod Normal Preempted default-scheduler by <namespace>/querier-9467b8d85-7kwf5 on node gke-us-central1-us-central1-bigger-no-6dc155a4-jsqx

Persisting and being able to query the events is important, but unfortunately, Kubernetes only persists the events for one hour to reduce the load on etcd. Loki, however, is a good fit to store and query the events.

I started exploring different ways Grafana Labs could get the events into Loki, including adding a source to Promtail itself. Luckily, I found that Heptio, which was acquired by VMWare in 2018, had already built eventrouter for this exact use case – extracting events from Kubernetes and sending them to a third-party service.

One of the good things about eventrouter is that it’s pluggable, and one can write a Loki sink. But it’s also possible to write the events out stdout JSON and use Promtail to scrape them, which is the route I went for.

After deploying eventrouter, seeing the events in Kubernetes was very simple with the query {name=’eventrouter’}. But then I started to notice that there were so many events and decided that we should be able to select or sort events based on the namespace at the very least.

To do so, I leveraged Promtail’s pipeline configuration to add namespace as an additional label to the logs exported to Loki:

- match:
  selector: '{name="eventrouter"}'
  stages:
  - json:
      expressions:
        namespace: event.metadata.namespace
  - labels:
    namespace: ""

This would take the namespace from the event’s JSONPath and add it as a label.

As a result, I can query all the events from just the grafana-com namespace:

Events from grafana-com namespace
Events from grafana-com namespace

What’s Next

While we’re quite happy that we can now store and retrieve events, we’ve found the UI is quite lacking when dealing with JSON logs. I personally find the entire JSON blob is just distracting when we’re looking for the values to a few keys, so we’re looking at ways we can improve handling JSON logs.

More About Loki

Launched at KubeCon North America last December, Loki is a Prometheus-inspired service that optimizes storage, search, and aggregation while making logs easy to explore natively in Grafana. Loki is designed to work easily both as microservices and as monoliths, and correlates logs and metrics to save users money.

Less than a year later, Loki has almost 6,500 stars on GitHub and is now quickly approaching GA.

At Grafana Labs, we’ve been working hard on developing key features to make that possible, including loki-canary early detection for missing logs, the Docker logging driver plugin and support for systemd, and adding structure to unstructured logs with the pipeline stage.

You can also read about query optimization in Loki in our three-part series which covers topics such as the use of Go, iterators, as well as ingestion retention and label queries.

Be sure to check back for more content about Loki.