Data configurationGrafana integrationsIntegrations referenceKafka integration

Kafka integration for Grafana Cloud

Kafka is an open-source distributed event streaming platform used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

The Kafka integration for Grafana Cloud provides a set of dashboards and alerts to monitor the main resources pertaining to a Kafka cluster deployment, such as Kafka instances, Zookeeper instances, Kafka Connect, Schema Registry, and KSql. The integration also provides additional dashboards for message lag and topics monitoring as well as 8 alerts for critical situations.

Install Kafka Integration for Grafana Cloud

  1. In your Grafana Cloud instance, click Integrations and Connections (lightning bolt icon), then search or navigate to the Kafka tile.
  2. Click the Kafka tile then click Install Integration
  3. Once the integration is installed, follow the steps on the Configuration Details page to setup Grafana Agent to automatically scrape and send Kafka metrics to your Grafana Cloud instance.

Post-install configuration for the Kafka integration

It’s recommended that you configure a separate user for the Grafana agent to give it only the strictly mandatory security privileges necessary for monitoring your node. For more information, see Kafka lag exporter documentaion.

Configure JMX exporters

In order for the integration to work, you must configure a JMX exporter on each instance composing your Kafka Cluster, including all brokers, zookeepers, KSqlDB, schema registries and kafka connect nodes.

Most of the provided dashboards rely on data collection through a JMX Exporter running alongside each instance of your Kafka components JMV, as a java agent. For this integration, a JMX exporter is used for each Kafka piece being monitored. It’s tested using version 0.12.0 of this exporter.

For more information about how to configure your Kafka JVM, refer to the JMX Exporter documentation.

See the following configuration files for the Kafka integration:

Configuring the Grafana Agent

In order to monitor lag consumption, ensure your Grafana Agent is updated to version 0.17.0 or higher. The lag consumption dashboard is fed by an external exporter, which is embedded in the Grafana Agent for ease of use. Use the latest version of the Grafana Agent to enable it. Refer to the following example agent configuration:

integrations:
  kafka_exporter:
    enabled: true
    instance: your_instance_name
    kafka_uris:
      - your_kafka_instance_dns:9092
    scrape_integration: true
    scrape_interval: 15s

If you modify the Agent ConfigMap, you will need to restart the Agent Pod for configuration changes to take effect. Use kubectl rollout to restart the Agent:

$ kubectl rollout restart deployment/grafana-agent

For more information, see the kafka_exporter_config Grafana Agent documentation.

Dashboards

The Kafka integration for Grafana Cloud provides seven pre-configured dashboards to help you monitor your Kafka clusters. The following dashboards are provided with this integration:

Kafka overview

This dashboard provides a comprehensive view of the overall health of your Kafka cluster, including how many brokers are alive in the cluster; metrics for your partitions; JVM, throughput, requests, response queues size, Zookeeper connections, and producer and consumer metrics.

kafka dashboard example1

Kafka Connect overview

This dashboard is focused on Kafka Connect tasks. You can view how many tasks are running, paused, failed, unassigned, and destroyed. There are additional panels that show more technical details about your tasks, such as network, IO, authentication, and connection statistics, along with batch size, offset, and task error metrics. The dashboard also shows the overall health of your Kafka Connect Cluster JVM.

kafka dashboard example2

KSqlDB overview

This is a comprehensive dashboard that provides a range of your KSqlDB cluster metrics: the number of active, running, stopped, and idle; the status of each query; the life of your cluster; message throughput; JMV metrics; and more.

kafka dashboard example3

Kafka Schema Registry overview

This dashboard is focused on your Kafka Schema Registry. It provides metrics related to the number of registered, created, and deleted schemas along with JMV and throughput metrics.

kafka dashboard example4

Kafka Topics overview

This dashboard provides a deep dive into each topics health and shows the throughput in bytes and number of messages as well as the offsets.

Kafka lag overview

This dashboard shows the consumption lag of each Topic, including offset lag in quantity, estimated time in seconds, and message throughput per minute and second. It is fed by the external exporter, which is embedded in the Grafana Agent.

Zookeeper overview

This dashboard provides a general overview of your Zookeeper cluster. It focuses on JMV metrics, number of nodes online, active connections, and throughput.

Configure alerts

This integrations provides 8 alerts, between warning and critical levels:

AlertDescription
OfflinePartitonCountCritical: Partition is in OfflinePartition state. Offline partitions are not available for reading and writing.
UnderReplicatedPartitionCountCritical: One or more replicas are not available. This is usually because a broker is down.
ActiveControllerCritical: No active controller reported in last 5 minute interval.
UncleanLeaderElectionCritical: Unclean partition leader elections in the cluster reported in the last 1 minute interval. Set broker configuration parameter unclean.leader.election.enable to false.
ISRExpandRateWarning: ISR expansion rate > 0. If ISR is expanding and shrinking frequently, adjust Allowed replica lag.
ISRShrinkRateWarning: ISR shrink rate > 0. If ISR is expanding and shrinking frequently, adjust Allowed replica lag.
BrokerCountCritical: Broker count is 0.
ZookeeperSyncConnectWarning: Zookeeper Sync Disconnected.

Cost

By connecting your Kafka integration to Grafana Cloud you might incur charges. For more information, use the following links:

For an increase in the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.