Kafka integration for Grafana Cloud
Kafka is an open-source distributed event streaming platform used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
The Kafka integration for Grafana Cloud provides a set of dashboards and alerts to monitor the main resources pertaining to a Kafka cluster deployment, such as Kafka instances, Zookeeper instances, Kafka Connect, Schema Registry, and KSql. The integration also provides additional dashboards for message lag and topics monitoring as well as 8 alerts for critical situations.
Install Kafka Integration for Grafana Cloud
- In your Grafana Cloud instance, click Integrations and Connections (lightning bolt icon), then search or navigate to the Kafka tile.
- Click the Kafka tile then click Install Integration
- Once the integration is installed, follow the steps on the Configuration Details page to setup Grafana Agent to automatically scrape and send Kafka metrics to your Grafana Cloud instance.
Post-install configuration for the Kafka integration
It’s recommended that you configure a separate user for the Grafana agent to give it only the strictly mandatory security privileges necessary for monitoring your node. For more information, see Kafka lag exporter documentaion.
Configure JMX exporters
In order for the integration to work, you must configure a JMX exporter on each instance composing your Kafka Cluster, including all brokers, zookeepers, KSqlDB, schema registries and kafka connect nodes.
Most of the provided dashboards rely on data collection through a JMX Exporter running alongside each instance of your Kafka components JMV, as a java agent. For this integration, a JMX exporter is used for each Kafka piece being monitored. It’s tested using version 0.12.0 of this exporter.
For more information about how to configure your Kafka JVM, refer to the JMX Exporter documentation.
See the following configuration files for the Kafka integration:
Configuring the Grafana Agent
In order to monitor lag consumption, ensure your Grafana Agent is updated to version 0.17.0 or higher. The lag consumption dashboard is fed by an external exporter, which is embedded in the Grafana Agent for ease of use. Use the latest version of the Grafana Agent to enable it. Refer to the following example agent configuration:
integrations:
kafka_exporter:
enabled: true
instance: your_instance_name
kafka_uris:
- your_kafka_instance_dns:9092
scrape_integration: true
scrape_interval: 15s
If you modify the Agent ConfigMap, you will need to restart the Agent Pod for configuration changes to take effect. Use kubectl rollout to restart the Agent:
$ kubectl rollout restart deployment/grafana-agent
For more information, see the kafka_exporter_config Grafana Agent documentation.
Dashboards
The Kafka integration for Grafana Cloud provides seven pre-configured dashboards to help you monitor your Kafka clusters. The following dashboards are provided with this integration:
Kafka overview
This dashboard provides a comprehensive view of the overall health of your Kafka cluster, including how many brokers are alive in the cluster; metrics for your partitions; JVM, throughput, requests, response queues size, Zookeeper connections, and producer and consumer metrics.
Kafka Connect overview
This dashboard is focused on Kafka Connect tasks. You can view how many tasks are running, paused, failed, unassigned, and destroyed. There are additional panels that show more technical details about your tasks, such as network, IO, authentication, and connection statistics, along with batch size, offset, and task error metrics. The dashboard also shows the overall health of your Kafka Connect Cluster JVM.
KSqlDB overview
This is a comprehensive dashboard that provides a range of your KSqlDB cluster metrics: the number of active, running, stopped, and idle; the status of each query; the life of your cluster; message throughput; JMV metrics; and more.
Kafka Schema Registry overview
This dashboard is focused on your Kafka Schema Registry. It provides metrics related to the number of registered, created, and deleted schemas along with JMV and throughput metrics.
Kafka Topics overview
This dashboard provides a deep dive into each topics health and shows the throughput in bytes and number of messages as well as the offsets.
Kafka lag overview
This dashboard shows the consumption lag of each Topic, including offset lag in quantity, estimated time in seconds, and message throughput per minute and second. It is fed by the external exporter, which is embedded in the Grafana Agent.
Zookeeper overview
This dashboard provides a general overview of your Zookeeper cluster. It focuses on JMV metrics, number of nodes online, active connections, and throughput.
Configure alerts
This integrations provides 8 alerts, between warning and critical levels:
Alert | Description |
---|---|
OfflinePartitonCount | Critical: Partition is in OfflinePartition state. Offline partitions are not available for reading and writing. |
UnderReplicatedPartitionCount | Critical: One or more replicas are not available. This is usually because a broker is down. |
ActiveController | Critical: No active controller reported in last 5 minute interval. |
UncleanLeaderElection | Critical: Unclean partition leader elections in the cluster reported in the last 1 minute interval. Set broker configuration parameter unclean.leader.election.enable to false. |
ISRExpandRate | Warning: ISR expansion rate > 0. If ISR is expanding and shrinking frequently, adjust Allowed replica lag. |
ISRShrinkRate | Warning: ISR shrink rate > 0. If ISR is expanding and shrinking frequently, adjust Allowed replica lag. |
BrokerCount | Critical: Broker count is 0. |
ZookeeperSyncConnect | Warning: Zookeeper Sync Disconnected. |
Cost
By connecting your Kafka integration to Grafana Cloud you might incur charges. For more information, use the following links:
For an increase in the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.
Related Grafana Cloud resources
Intro to Prometheus and Grafana Cloud
Prometheus is taking over the monitoring world! In this webinar, we will start with a quick introduction to the open source project that’s the de facto standard for monitoring modern, cloud native systems.
How to set up and visualize synthetic monitoring at scale with Grafana Cloud
Learn how to use Kubernetes, Grafana Loki, and Grafana Cloud’s synthetic monitoring feature to set up your infrastructure's checks in this GrafanaCONline session.
Using Grafana Cloud to drive manufacturing plant efficiency
This GrafanaCONline session tells how Grafana helps a 75-year-old manufacturing company with product quality and equipment maintenance.