Event hero background image

The open source journey: Enabling real-time Kafka data visualization in Grafana

Have you ever wanted to visualize live Apache Kafka data directly in Grafana, eliminating the need for complex intermediate storage or external services?

In this session, Hamed Karbasi, a data engineer at 0+X, chronicles the open source development journey of the Grafana Kafka data source plugin, which allows users to query and visualize real-time Kafka data streams instantly.

See a live demonstration of how to get started with the plugin to integrate Kafka data for observability, including the steps you take to:

  • Configure the plugin to ingest data from specific Kafka topics
  • Successfully handle flexible data formats, specifically Protobuf, JSON, and AVRO messages
  • Build dynamic, real-time dashboards for comprehensive streaming data visualization

By highlighting the strong community collaboration and support available for building new Grafana plugins that address real-world needs, Hamed hopes to inspire other developers to contribute their own integrations to the Grafana ecosystem.

Hamed Karbasi (00:00):

Hi everyone, I'm Hamed, glad to have you here. And today I'm going to bring you to my talk, my journey as an open source contributor to Grafana. So let's see what we have here. So first we are going to get more acquainted with me and then I'm going to talk about my journey, my first plugin. Then I'm going to talk about my second plugin, which is the Kafka plugin. Then I'm going to talk about the live demo. So in action we can see how the Kafka plugin for Grafana works and what in the future we are going to do for that. So actually I started my career as a data scientist about 10 years ago. Then I was gradually, actually grabbed into data engineering. So I was working at the Snapp! company which is the largest ride hailing company in the Middle East. And now I am at a O+X company in Sweden.

(01:02):

So let's talk about the journey. So, but first we should see that what Grafana plugins are. So the Grafana plugins we have, we can actually divide them into three main categories. So the first category is the panel plugins. So panel plugins are mainly for actually visualizing stuff in Grafana. They're actually located exactly in the front end. And to develop them, you actually need to know TypeScript and React. So everything you see in the visualizations, gauge panels, those are all the panel plugins. But the next one actually are the data source plugins. So if you want to connect Grafana to some sort of database broker or whatever you see there, those are actually needed by data source plugins. So data source plugins, which mainly the current ones are the backend plugins. So you need actually to know Golang to use Grafana SDK.

(02:02):

And of course again, you should actually know the front end to develop actually the front end side in native in React TypeScript. So everything you see in data sources, for example, connecting to Prometheus, those are all handled by the data source plugins. And the final thing is the app plugins. So app plugins are going to actually extend the capabilities of the other plugins. So you can bundle both panel plugins with data source plugins or a bunch of them to create something bigger. So what you see actually for example, Kubernetes plugin, Loki, and Redis, those are all handled by the app plugins. So now that we know what exactly plugins are, my first plugin actually was about, I dunno, 2021. And the problem was that Grafana had released Node Graph, which I guess it was Grafana version eight and there wasn't a way that you could actually use data source to show your data as graph in Grafana.

(03:11):

And the only plugin there was X-Ray plugin in AWS, it was commercial, so nothing on open source that people could use. So I told myself that okay, let's go and develop a plugin there. And that was my first plugin which was called Node Graph API. And if you want to see what journey you can take to be a contributor to Grafana, this is a journey that you can do. So the first thing is that you should detect the problem that, okay, maybe Grafana lacks of something that I can add to Grafana as a plugin. So you do the research. If already something has been done, if actually it is the current problem even you can bring the query your questions into the flume of Grafana, you can ask question in Slack, you can actually consult with the developers of Grafana and they are pretty supportive and they can come back to you and know what your idea is.

(04:06):

And then actually you can choose the type of plugin. Is it going to be app plugin, is it the data source one, a panel one? Right after that you can actually start developing using the good SDKs that Grafana has provided and then you can validate the plugin using Grafana plugin validator. And later you can actually submit your plugin to Grafana. So by submitting the plugin then the reviewers from Grafana will come back to you and they will check your plugin if it's working properly. And then finally actually you can see that it is there, it is going to be in the marketplace of Grafana, utilized by many people. So what about the Kafka plugin development? So first we will talk about Apache Kafka, but first of all let me know how many of you know Kafka or are using Kafka as a daily thing? Perfect, thank you.

(04:59):

So the Kafka is actually a distributed, fault-tolerant, and scalable pub-sub message broker system. And the key features is that you can actually have real data streaming. It is high throughput, fault-tolerant, and very, very scalable. And of course you can use it for event driven architecture, data pipelines, real time analytics. And the architecture was that you have producers, just like any other event driven pub-sub system, you have producers, you have the broker cluster, you have the consumers. Previously I guess it was before version five you had to use Zookeeper. But from version five you don't need even Zookeeper anymore. And Grafana is now based on KRaft. So here is actually how Grafana funnel, I'm sorry, Kafka works. So the thing is that you use producers. So the producers are gonna actually produce messages on Kafka Broker on a topic and each topic is using partitions.

(06:00):

So, and partitioning is one of the most powerful features of Kafka to make it scalable. So based on the IDs, actually the messages with the same ID end up on the same partition on Kafka. Then we have the consumers. So the consumers are using a concept called consumer groups. So actually you can have different consumer groups and that are pulling the messages from the same topic. So, and you can actually have different services and which are actually getting messages from the same topic. And then actually you can have the concept of, again, having messages across different consumer groups. So you can do the broadcasting. So back in the previous one, while you can actually have different consumers across the same consumer group, actually depending on the number of the partitions. So you can scale up your consumers and you can broadcast messages across different services in your system.

(07:05):

So the situation was that you could see for, I mean we have Grafana, we have our Apache Kafka cluster, but the thing was that around that time, I mean four years ago you could only see the metrics of the Kafka cluster, I mean metadata on metrics of the Kafka cluster on Grafana. Like the number of the nodes, the size of the cluster, the number of the topics, number of the messages, number of the consumers. Those metrics on Grafana using the Prometheus and using the exporters. But the problem was that you couldn't see the data itself in Kafka in Grafana. So what if I'm interested in visualize those data into cool dashboards in Grafana, and that was the thing. So the idea was actually developing a data source plugin which was for that, which was Kafka data source plugin, my second plugin. So the thing that it works, just like other plugins, you can just, I will show you shortly, that you can just customize it.

(08:02):

You can just give it your bootstrap servers, whether it is on actual host, on different vendors or on your on-premise servers. So, and you can actually, if it's using SaaS or whatever, it'll connect to it and you will just have, you have already your messages on Kafka and you can just visualize it in Grafana. So the features of this plugin is that you have the real time monitoring Kafka. So it is completely streaming. It is not based on data query so you don't need to refresh. It is based on a streaming and you can query all the specific partitions. You have the auto complete, because normally you will have lots of topic names. So the auto complete will help you to search in the queries. You have the flexible offset options you can do based on the latest offset or the last end of messages, last 100 messages, or from earliest.

(08:59):

You have time established to use the Kafka event time or the dashboard receive time. You have the advanced JSON support, whether they are flat, nested, arrays or mixed types. ProtoBuf and Avro support. You can use both schema registry or inline schema and you have a Kafka authentication SAS and encryption whether they are SSL or TLS, mTLS, and SCRAM. So let's talk about the cases studies. So in terms of the cases studies, we can see that how we can actually use it for your cases. So one thing is of course is an IoT sensor of monitoring. So let's assume that we have two buildings, having legacy sensors, and one building link has a producer version one, the other one has a producer version two. And by version I mean the schema. So they have actually different schemas. And you have Kafka which are actually handling those sensor metrics on its topics,

(09:53):

and you have the schema registry hosting the schemas of the Avro different versions and you can actually use that plugin on your Grafana that is decoding those Avro messages and it can just show them lively on Grafana. Temperature, humidity, anything. Or for example in terms of the analytics. So let's say that we have app events for example, user actions, clicks, anything counting their purchases. They are for example in JSON, serializing JSON, you have them on a Kafka topic and then again whether they are a schema registry or not, you can actually use again the JSON decoder of Grafana plugin there to have those analytics dashboards on Kafka. Perfect. So let's see in action that. How is that

(11:03):

Okay perfect we have it there. Alright, so what we have here, we have here Grafana. I will show you first that how you can find the data source. So if you go to the data sources, then add the connection, you can simply find the plugin here which is Kafka and it can just install it. And this downloaded more than 4 million times simply for me. And if you again go to the data source and then you have added it already. So you have your plugin here, you can just add your bootstrap servers, separated by semicolons. And if you need clients' id, different security protocols, plane text, ssl, SAS Plain Text. And if it's on SaaS or for example SASS SSL, you have different options. The mechanism whether plan, SCRUM, SHA, or username, password, different TLS options.

(12:03):

So and of course you can provide this just URL here, everything fine, then you'll get the data sources working. So now let's say that I want first explore before going to the Grafana itself. So I'm just going to first produce some test messages. So these are just test data and in JSON and if I just put those messages there using autocomplete you can fetch different partitions there and then you have those data coming there which is a nested JSON, and you can actually choose the offset based on latest list, last, end messages or you can have the message format, JSON, Avro, ProtoBuf, plain text. The key key format string JSON base 64 and LES. So that's it. You can have actually used data there and I said last messages for example, the last 100 messages and it'll be live there. But I'm already actually producing some cool data in different formats on payments data and Avro simultaneously in Avro, JSON, and ProtoBuf.

(13:11):

So let's see how our dashboard here. So this is a live dashboard and it is now just visualizing the Kafka data there, transaction value, how many authorization outcomes are pending, authorized failed. The green dot here actually shows that it is streaming, no need to refresh any page, and yeah, so those are actually those data coming out there and the live feed that we can have there. And based on your creativity or anything and your usage, I mean your requirements, you can just use any kind of transformation of Grafana to visualize the data that are interested in.

(13:54):

And we have here of course that data for transactional ledger. It is in Avro, the ProtoBuf data, for payment analytics. It is in ProtoBuf. And we have here the plain text here and that's it. Alright and that's for the dashboards and whether if you are interested to see what is schema, what was using. So those are the Avro values, but if you go to the schema registry so you can see that okay this is the utilized there but for schema it's good to see how it is working. So let me just show you, work through an example. So I'm going to produce some data this time in Avro, and I'm going to produce that example there. And again I'm going to come back to that's our use case. So if you see how exactly this schema works, if I use the test Avro topic here, which I'm utilizing that data there. If you see first the key, the message format is in JSON, which of course it doesn't know it. It cannot decode it, but if I actually put that onto Avro now I can see that data which is nested. It is going to flatten it and you can actually see that it's using that schema registry. You can test the connection if it's working

(15:14):

and if you are not using a schema registry, you can simply put that on inline schema and you can just copy paste your schema there or you choose a file actually to upload your schema both for Avro and ProtoBuf. Okay, let's get back to the slides.

(15:34):

So what future works are going to be done there and I'm going to, now we have the software for streaming anything in real time, but it'll be actually more cool to have data queries. So actually you can even query on your Kafka to have data for example for, I dunno one month ago Ii the topic has those messages and the offset that you are interested in. So with that data query actually you can time travel to the Kafka messages and have those messages based on your liking. And that option actually gives us another capability of having alert teams. So that will be cool if we can have alert on top of Kafka and Grafana. Now with Grafana we are handling many alerts on top of parameters or any other data sources. But what if I'm interested in having that defining alerts on Kafka. So if I see some sort of pattern in a Kafka topic, so Grafana can actually alert me, send an alert to me.

(16:35):

And we're going to add support for other authentication methods like Kerberos or OAuth. Actually that's might cover other requirements of the community. Thank you for your attention. And you can actually scan that QR code. It'll be my contact, you can contact me, you can find a repository in my GitHub, and actually you can send an issue, submit an issue or a PR if you're interested in contributing to it. Perfect. Thank you.

Speakers