Menu

Caution

Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.

Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Beta

Clustering (beta)

Clustering enables a fleet of agents to work together for workload distribution and high availability. It helps create horizontally scalable deployments with minimal resource and operational overhead.

To achieve this, Grafana Agent makes use of an eventually consistent model that assumes all participating Agents are interchangeable and converge on using the same configuration file.

The behavior of a standalone, non-clustered agent is the same as if it was a single-node cluster.

You configure clustering by passing cluster command-line flags to the run command.

Use cases

Target auto-distribution

Target auto-distribution is the most basic use case of clustering; it allows scraping components running on all peers to distribute scrape load between themselves. For target auto-distribution to work correctly, all agents in the same cluster must be able to reach the same service discovery APIs and must be able to scrape the same targets.

You must explicitly enable target auto-distribution on components by defining a clustering block, such as:

river
prometheus.scrape "default" {
    clustering {
        enabled = true
    }

    ...
}

A cluster state change is detected when a new node joins or an existing node goes away. All participating components locally recalculate target ownership and rebalance the number of targets they’re scraping without explicitly communicating ownership over the network.

Target auto-distribution allows you to dynamically scale the number of agents to distribute workload during peaks. It also provides resiliency because targets are automatically picked up by one of the node peers if a node goes away.

The agent uses a fully-local consistent hashing algorithm to distribute targets, meaning that, on average, only ~1/N of the targets are redistributed.

Refer to component reference documentation to discover whether it supports clustering, such as:

Cluster monitoring and troubleshooting

To monitor your cluster status, you can check the Flow UI clustering page. The debugging topic contains some clues to help pin down probable clustering issues.