Caution
Grafana Alloy is the new name for our distribution of the OTel collector. Grafana Agent has been deprecated and is in Long-Term Support (LTS) through October 31, 2025. Grafana Agent will reach an End-of-Life (EOL) on November 1, 2025. Read more about why we recommend migrating to Grafana Alloy.
Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.
Operator architecture
This guide gives a high-level overview of how the Grafana Agent Operator works.
The Grafana Agent Operator works in two phases:
- Discover a hierarchy of custom resources
- Reconcile that hierarchy into a Grafana Agent deployment
Custom Resource Hierarchy
The root of the custom resource hierarchy is the GrafanaAgent
resource. It is
primary resource the Operator looks for, and is called the “root” because it
discovers many other sub-resources.
The full hierarchy of custom resources is as follows:
GrafanaAgent
MetricsInstance
PodMonitor
Probe
ServiceMonitor
LogsInstance
PodLogs
Most of the resources above have the ability to reference a ConfigMap or a Secret. All referenced ConfigMaps or Secrets are added into the resource hierarchy.
When a hierarchy is established, each item is watched for changes. Any changed item will cause a reconcile of the root GrafanaAgent resource, either creating, modifying, or deleting the corresponding Grafana Agent deployment.
A single resource can belong to multiple hierarchies. For example, if two GrafanaAgents use the same Probe, modifying that Probe will cause both GrafanaAgents to be reconciled.
Reconcile
When a resource hierarchy is created, updated, or deleted, a reconcile occurs. When a GrafanaAgent resource is deleted, the corresponding Grafana Agent deployment will also be deleted.
Reconciling creates a few cluster resources:
- A Secret is generated holding the configuration of the Grafana Agent.
- Another Secret is created holding all referenced Secrets or ConfigMaps from the resource hierarchy. This ensures that Secrets referenced from a custom resource in another namespace can still be read.
- A Service is created to govern the created StatefulSets.
- One StatefulSet per Prometheus shard is created.
PodMonitors, Probes, and ServiceMonitors are turned into individual scrape jobs which all use Kubernetes SD.
Sharding and replication
The GrafanaAgent resource can specify a number of shards. Each shard results in the creation of a StatefulSet with a hashmod + keep relabel_config per job:
- source_labels: [__address__]
target_label: __tmp_hash
modulus: NUM_SHARDS
action: hashmod
- source_labels: [__tmp_hash]
regex: CURRENT_STATEFULSET_SHARD
action: keep
This allows for some decent horizontal scaling capabilities, where each shard will handle roughly 1/N of the total scrape load. Note that this does not use consistent hashing, which means changing the number of shards will cause anywhere between 1/N to N targets to reshuffle.
The sharding mechanism is borrowed from the Prometheus Operator.
The number of replicas can be defined, similarly to the number of shards. This creates duplicate shards. This must be paired with a remote_write system that can perform HA duplication. Grafana Cloud and Cortex provide this out of the box, and the Grafana Agent Operator defaults support these two systems.
The total number of created metrics pods will be product of numShards * numReplicas
.
Labels
Two labels are added by default to every metric:
cluster
, representing theGrafanaAgent
deployment. Holds the value of<GrafanaAgent.metadata.namespace>/<GrafanaAgent.metadata.name>
.__replica__
, representing the replica number of the Agent. This label works out of the box with Grafana Cloud and Cortex’s HA deduplication.
The shard number is not added as a label, as sharding is designed to be transparent on the receiver end.