Scraping service (Beta)
The Grafana Agent scraping service allows you to cluster a set of Agent processes and distribute the scrape load.
Determining what to scrape is done by writing instance configuration files to an API, which then stores the configuration files in a KV store backend. All agents in the cluster must use the same KV store to see the same set of configuration files.
Each process of the Grafana Agent can be running multiple independent “instances” at once, where an “instance” refers to the combination of:
- Service discovery for all
scrape_configswithin that loaded configuration
- Scrapes metrics from all discovered targets
- Stores data in its own Write-Ahead Log specific to the loaded configuration
- Remote Writes scraped metrics to the configured
remote_writedestinations specified within the loaded configuration.
The “instance configuration file,” then, is the configuration file that
specifies the set of
remote_write endpoints. For example,
a small instance configuration file looks like this:
scrape_configs: - job_name: self-scrape static_configs: - targets: ['localhost:9090'] labels: process: 'agent' remote_write: - url: http://cortex:9009/api/prom/push
The full set of supported options for an instance configuration file is
available in the
Multiple instance configuration files are necessary for sharding. Each config file is distributed to a particular agent on the cluster based on the hash of its contents.
When the scraping service is enabled, Agents disallow specifying
instance configurations locally in the configuration file; using the KV store
agentctl can be used to manually sync
instance configuration files to the Agent’s API server.
Distributed hash ring
The scraping service uses a Distributed Hash Ring (commonly called “the ring”) to cluster agents and to shard configurations within that ring. Each Agent joins the ring with a random distinct set of tokens that are used for sharding. The default number of generated tokens is 128.
The Distributed Hash Ring is also stored in a KV store. Since a KV store is also needed for storing configuration files, it is encouraged to re-use the same KV store for the ring.
When sharding, the Agent currently uses the name of a configuration file stored in the KV store for load distribution. Configuration names are guaranteed to be unique keys. The hash of the name is used as the lookup key in the ring and determines which agent (based on token) should be responsible for that configuration. “Price is Right” rules are used for the Agent lookup; the Agent owning the token with the closest value to the key without going over is responsible for the configuration.
All Agents are simultaneously watching the KV store for changes to the set of configuration files. When a configuration file is added or updated in the configuration store, each Agent will run the configuration name hash through their copy of the Hash Ring to determine if they are responsible for that config.
When an Agent receives a new configuration that it is responsible for, it launches a new instance from the instance configuration. If a configuration is deleted from the KV store, this will be detected by the owning Agent, and it will stop the metric collection process for that configuration file.
When an Agent receives an event for an updated configuration file that they used to be the owner of but are no longer the owner, the associated instance for that configuration file is stopped for that Agent. This can happen when the cluster size changes.
The scraping service currently does not support replication. Only one agent at a time will be responsible for scraping a certain configuration.
When a new Agent joins or leaves the cluster, the set of tokens in the ring may cause configurations to hash to a new Agent. The process of responding to this action is called “resharding.”
Resharding is run:
- When an Agent joins the ring
- When an Agent leaves the ring
- When the KV store sends a notification indicating a configuration has changed.
- On a specified interval if KV change events have not fired.
The resharding process involves each Agent retrieving the full set of configurations stored in the KV store and determining if:
- The configuration owned by the current resharding Agent has changed and needs to be reloaded.
- The configuration is no longer owned by the current resharding Agent and the associated instance should be stopped.
- The configuration has been deleted, and the associated instance should be stopped.
Because distribution is determined by the number of configuration files and not how many targets exist per configuration file, the best amount of distribution is achieved when each configuration file has the lowest amount of targets possible. The best distribution will be achieved if each configuration file stored in the KV store is limited to one static configuration with only one target.
Here’s an example
agent.yaml configuration file that uses the same
etcd server for
both configuration storage and the distributed hash ring storage:
server: log_level: debug metrics: global: scrape_interval: 1m scraping_service: enabled: true kvstore: store: etcd etcd: endpoints: - etcd:2379 lifecycler: ring: replication_factor: 1 kvstore: store: etcd etcd: endpoints: - etcd:2379
Note that there are no instance configurations present in this example; instance configurations must be passed to the API for the Agent to start scraping metrics.
agentctl is a tool included with this repository that helps users interact
with the new Config Management API. The
agentctl config-sync subcommand uses
local YAML files as a source of truth and syncs their contents with the API.
Entries in the API not in the synced directory will be deleted.
agentctl is distributed in binary form with each release and as a Docker
container with the
grafana/agentctl image. Tanka configurations that
grafana/agentctl and sync a set of configurations to the API
are planned for the future.