Open source

Use Grafana Alloy as a proxy or aggregation layer

In larger deployments, you can run one or more Alloy instances as proxies in front of other Alloy instances. This pattern reduces direct connections to backends such as Mimir, Loki, and Tempo, while centralizing egress traffic. You can apply consistent relabeling, filtering, or routing logic at the proxy layer, isolating edge instances from backend changes. This architecture also supports sharding and load distribution across multiple proxy instances.

In OpenTelemetry terminology, this deployment model is often referred to as gateway mode.

Note

The proxy configuration described here refers to using Alloy as a telemetry proxy that aggregates and forwards telemetry between instances. It doesn’t cover configuring Alloy to use a corporate HTTP proxy for outbound traffic, such as proxy_url or proxy_from_environment in prometheus.remote_write or loki.write.

Before you begin

Before you begin, ensure you have the following:

  • A working Alloy installation on your edge nodes.
  • Access to deploy additional Alloy instances as proxies.
  • A load balancer or ingress controller for routing traffic to proxy instances.
  • Network connectivity between edge instances, proxy instances, and backend services.

Architectural patterns

You can use two primary topologies when deploying Alloy as a proxy layer: push to proxy and pull from edge.

Push to proxy

In the push-to-proxy pattern, edge Alloy instances push telemetry to a pool of proxy Alloy instances. This is the most common and recommended pattern because it provides a straightforward mental model, scales cleanly in dynamic environments, and works across networks with NAT or segmented connectivity. You can centralize authentication and routing at the proxy layer, and the pattern is compatible with both Kubernetes and VM environments.

flowchart LR

  EdgeAlloy[Edge Alloy]
  LoadBalancer[Load Balancer]
  ProxyAlloy[Proxy Alloy x N]
  Backend[Backend]

  EdgeAlloy -->|remote_write| LoadBalancer
  LoadBalancer --> ProxyAlloy
  ProxyAlloy --> Backend

  %% Grafana styling
  classDef grafana fill:#ffffff,stroke:#F05A28,stroke-width:2px,rx:8,ry:8,color:#1f2937,font-weight:600;

  class EdgeAlloy,LoadBalancer,ProxyAlloy,Backend grafana

For metrics, edge instances push data using prometheus.remote_write to proxy instances running prometheus.receive_http. For logs, edge instances push data using loki.write to proxy instances running loki.source.api.

Sticky load balancing for metrics

Sticky load balancing ensures that requests with the same identifier, such as a time series or trace ID, are consistently routed to the same backend instance.

For Prometheus prometheus.remote_write traffic, you must ensure consistent routing per time series. When different proxy instances receive samples for the same series, you encounter out-of-order sample errors, increased ingestion load, and write-ahead log (WAL) churn.

Warning

Without sticky load balancing, metrics proxying can result in data loss or ingestion errors.

To avoid these issues, configure your load balancer with sticky sessions, consistent hashing, or L4 hash-based load balancing.

Pull from edge

In the pull-from-edge pattern, proxy Alloy instances scrape targets directly, using sharding such as hashmod to distribute targets across instances.

This pattern works for metrics because Prometheus-style scraping supports deterministic target sharding. For more information on distributing scrape load, refer to Distribute Prometheus metrics scrape load.

Note

The pull model doesn’t apply to logs. Logs must use a push model.

While technically possible for metrics, using proxy instances to scrape other Alloy instances isn’t recommended as a primary aggregation strategy. Push-based aggregation using prometheus.remote_write provides clearer scaling characteristics, simpler configuration management, and better compatibility with dynamic environments.

Configure metrics proxying

You can use the push pattern to proxy metrics between edge and proxy instances.

Configure edge instances for metrics

Edge instances use prometheus.scrape to scrape metrics locally and prometheus.remote_write to push them to proxy instances. The following example configuration scrapes a local Node Exporter and pushes metrics to a proxy:

Alloy
prometheus.scrape "node" {
  targets = [{
    __address__ = "localhost:9100"
  }]
  forward_to = [prometheus.remote_write.to_proxy.receiver]
}

prometheus.remote_write "to_proxy" {
  endpoint {
    url = "https://<PROXY_LOAD_BALANCER>/api/v1/metrics/write"
  }
}

Replace the following:

  • <PROXY_LOAD_BALANCER>: The URL of your load balancer in front of the proxy Alloy instances.

Note

<PROXY_LOAD_BALANCER> is the address where edge instances send data. Your load balancer must forward each request path to the port exposed by your proxy Alloy instances.

For example:

  • Metrics: forward https://<PROXY_LOAD_BALANCER>/api/v1/metrics/write to proxy instances listening on port 12345
  • Logs: forward https://<PROXY_LOAD_BALANCER>/loki/api/v1/push to proxy instances listening on port 3100

Configure proxy instances for metrics

Proxy instances use prometheus.receive_http to receive metrics from edge instances and prometheus.remote_write to forward them to the backend. The following example configuration receives metrics and forwards them to Mimir:

Alloy
prometheus.receive_http "ingest" {
  http {
    listen_address = "0.0.0.0"
    listen_port    = 12345
  }
  forward_to = [prometheus.remote_write.to_backend.receiver]
}

prometheus.remote_write "to_backend" {
  endpoint {
    url = "https://<MIMIR_ENDPOINT>/api/v1/push"
  }
}

Replace the following:

  • <MIMIR_ENDPOINT>: The URL of your Mimir instance.

You can add relabeling, filtering, or tenant routing at the proxy layer by inserting a prometheus.relabel component between the receiver and prometheus.remote_write.

Configure logs proxying

Logs must use a push model because you can’t pull logs from other Alloy instances. Use loki.write on edge instances and loki.source.api on proxy instances.

Configure edge instances for logs

Edge instances use loki.source.file to collect logs and loki.write to push them to proxy instances. The following example configuration collects logs from files and pushes them to a proxy:

Alloy
loki.source.file "varlogs" {
  targets = [{
    __path__ = "/var/log/*.log"
  }]
  forward_to = [loki.write.to_proxy.receiver]
}

loki.write "to_proxy" {
  endpoint {
    url = "https://<PROXY_LOAD_BALANCER>/loki/api/v1/push"
  }
}

Replace the following:

  • <PROXY_LOAD_BALANCER>: The URL of your load balancer in front of the proxy Alloy instances.

Configure proxy instances for logs

Proxy instances use loki.source.api to receive logs from edge instances and loki.write to forward them to the backend. The following example configuration receives logs and forwards them to Loki:

Alloy
loki.source.api "ingest" {
  http {
    listen_address = "0.0.0.0"
    listen_port    = 3100
  }
  forward_to = [loki.write.to_backend.receiver]
}

loki.write "to_backend" {
  endpoint {
    url = "https://<LOKI_ENDPOINT>/loki/api/v1/push"
  }
}

Replace the following:

  • <LOKI_ENDPOINT>: The URL of your Loki instance.

Configure load balancing

For metrics proxying, configure your load balancer to provide consistent routing so that samples for the same time series always reach the same proxy instance.

The following example shows a simplified NGINX configuration for consistent routing:

nginx
upstream alloy_proxies {
    hash $remote_addr consistent;
    server proxy1:12345;
    server proxy2:12345;
    server proxy3:12345;
}

server {
    listen 443 ssl;

    location /api/v1/metrics/write {
        proxy_pass http://alloy_proxies;
    }
}

In production, prefer hashing based on series-identifying headers or use an L4 load balancer with source hashing for better distribution.

Signal support

The following table shows what patterns each signal type supports:

SignalPush through proxyPull with shardingNotes
MetricsSupportedSupportedSticky routing required for push
LogsSupportedNot supportedPush only
TracesDependsGenerally noUse OpenTelemetry-compatible receivers
ProfilesSupportedNoEdge pyroscope.write, proxy pyroscope.receive_http, backend pyroscope.write

For traces, you typically configure edge instances to send data to an OpenTelemetry-compatible receiver, such as otelcol.receiver.otlp, on proxy instances. The proxy instances then export to the backend using an appropriate exporter. Basic trace forwarding doesn’t require sticky routing, but if proxy instances run trace-derived components such as otelcol.connector.spanmetrics or otelcol.connector.servicegraph, you need consistent routing so all spans for a trace or service reach the same instance. You can use otelcol.exporter.loadbalancing on the edge instances to route by trace ID or service name. Alternatively, you can add a unique label per proxy instance and aggregate the resulting metrics in PromQL or Adaptive Metrics.

For profiles, edge instances use pyroscope.write to push to proxy instances running pyroscope.receive_http. Refer to that component for supported ingest endpoints and how it forwards to receivers such as pyroscope.write. For chained pyroscope.write traffic, load balancing multiple receivers, and timeout configuration, refer to the troubleshooting sections on that component and on pyroscope.write.

High availability and replication

When you run multiple proxy instances, ensure consistent routing for prometheus.remote_write traffic to prevent out-of-order errors. Avoid double-writing unless you intentionally want data replicated across backends.

For high availability pairs, configure proper external labels such as cluster and replica so your backend can deduplicate data correctly. Refer to your backend documentation for specific high availability deduplication requirements. For example, Mimir requires specific label configurations to handle replica traffic.

Operational considerations

Capacity planning

Proxy instances handle ingestion, WAL writes for metrics, retries, and fan-out to the backend. Monitor CPU usage, memory usage, queue depth, remote write retries, and out-of-order sample errors to ensure your proxy instances have adequate capacity.

For metrics proxying, memory usage scales with the number of active time series passing through the proxy, even if the proxy doesn’t scrape targets directly. Each proxy instance maintains series state, WAL segments, and retry queues. High-cardinality workloads can require significant memory, and you may need to scale proxy replicas to handle large active series counts.

Resource requirements vary significantly depending on active series count, sample rate, log volume, relabeling complexity, and retry behavior. There is no fixed ratio of series to memory or CPU that applies universally. Always validate sizing assumptions under representative load conditions before production deployment.

Test with realistic production write volume before rollout to establish baseline resource requirements.

Failure modes

When a proxy fails, edge instances retry sending data, which causes WAL growth on the edge instances. Load shifts to the remaining healthy proxies, which may increase their resource usage.

When load balancing isn’t sticky, you encounter out-of-order errors and ingestion amplification as samples for the same series arrive at different proxy instances.

In environments with high ingestion rates, non-sticky routing can also amplify ingestion load on the backend. When samples for the same series arrive at multiple proxy instances, retries and duplicate handling increase overall system pressure. Always validate your load balancer configuration before rolling out proxying in production.

Fleet management compared to proxying

Proxying is an architecture pattern for runtime data flow. Fleet management, which includes centralized configuration distribution, rollout control, and secret management, helps you operate large numbers of Alloy instances but is separate from the proxy behavior.

You can use fleet tooling to deploy proxy instances, manage their configurations, rotate credentials, and scale horizontally. However, proxying itself doesn’t require a fleet management solution.

If you use fleet management to deploy or manage proxy instances, configure prometheus.remote_write endpoints and self-monitoring pipelines consistently across edge and proxy layers. Fleet tooling controls configuration distribution and rollout, but it doesn’t automatically create or enforce a proxy topology. You must explicitly design the data flow, including which instances push to proxies and how load balancing and routing are configured.

For information about configuring a proxy for Fleet Management API traffic in restricted network environments, refer to Custom proxy setup in the Fleet Management documentation.

When to use a proxy layer

A proxy layer is especially useful when you operate large fleets of Alloy instances. Without aggregation, each instance maintains its own outbound connections to backends such as Grafana Cloud, Mimir, Loki, or Tempo. In high-scale environments, this can lead to large numbers of TCP connections from a single network boundary, increasing firewall session load, ephemeral port usage, and operational risk. A proxy layer consolidates outbound connections and reduces connection pressure on shared network infrastructure.

Use proxy Alloy instances when you need to limit backend exposure, centralize relabeling or filtering, or isolate edge instances from backend authentication changes. A proxy layer also helps when you want to reduce outbound internet access from edge nodes or operate in segmented or air-gapped environments.

Avoid adding a proxy layer if you don’t need centralized control, already use a gateway such as the Mimir or Loki gateway, or want the simplest architecture possible.

Next steps