Menu
Beta

Clustering (beta)

Clustering enables a fleet of Grafana Agents to work together for workload distribution and high availability. It helps create horizontally scalable deployments with minimal resource and operational overhead.

To achieve this, Grafana Agent Flow makes use of an eventually consistent model that assumes all participating Grafana Agents are interchangeable and converge on using the same configuration file.

The behavior of a standalone, non-clustered Grafana Agent is the same as if it were a single-node cluster.

You configure clustering by passing cluster command-line flags to the run command.

Use cases

Target auto-distribution

Target auto-distribution is the most basic use case of clustering. It allows scraping components running on all peers to distribute the scrape load between themselves. Target auto-distribution requires that all Grafana Agent in the same cluster can reach the same service discovery APIs and scrape the same targets.

You must explicitly enable target auto-distribution on components by defining a clustering block.

river
prometheus.scrape "default" {
    clustering {
        enabled = true
    }

    ...
}

A cluster state change is detected when a new node joins or an existing node leaves. All participating components locally recalculate target ownership and re-balance the number of targets they’re scraping without explicitly communicating ownership over the network.

Target auto-distribution allows you to dynamically scale the number of Grafana Agents to distribute workload during peaks. It also provides resiliency because targets are automatically picked up by one of the node peers if a node leaves.

Grafana Agent Flow uses a local consistent hashing algorithm to distribute targets, meaning that, on average, only ~1/N of the targets are redistributed.

Refer to component reference documentation to discover whether it supports clustering, such as:

Cluster monitoring and troubleshooting

You can use the Grafana Agent Flow UI clustering page to monitor your cluster status. Refer to Debugging clustering issues for additional troubleshooting information.