Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Grafana Agent

Open source

Grafana Agent

The Grafana Agent is a telemetry collector for sending metrics, logs, and trace data to the opinionated Grafana observability stack.

It is commonly used as a tracing pipeline, offloading traces from the application and forwarding them to a storage backend. The Grafana Agent tracing stack is built using OpenTelemetry.

The Grafana Agent supports receiving traces in multiple formats: OTLP (OpenTelemetry), Jaeger, Zipkin and OpenCensus.

On top of receiving and exporting traces, the Grafana Agent contains many features that make your distributed tracing system more robust, and leverages all the data that is processed in the pipeline.

Architecture

The Grafana Agent can be configured to run a set of tracing pipelines to collect data from your applications and write it to Tempo. Pipelines are built using OpenTelemetry, and consist of receivers, processors and exporters. The architecture mirrors that of the OTel Collector’s design. See the configuration reference for all available config options. For a quick start, refer to this blog post.

Tracing pipeline architecture

This allows you to configure multiple distinct tracing pipelines, each of which collects separate spans and sends them to different backends.

Receiving traces

The Grafana Agent supports multiple ingestion receivers: OTLP (OpenTelemetry), Jaeger, Zipkin, OpenCensus and Kafka.

Each tracing pipeline can be configured to receive traces in all these formats. Traces that arrive to a pipeline will go through the receivers/processors/exporters defined in it.

Pipeline processing

The Grafana Agent processes tracing data as it flows through the pipeline to make the distributed tracing system more reliable and leverage the data for other purposes such as trace discovery, tail-based sampling, and generating metrics.

Batching

The Agent supports batching of traces. Batching helps better compress the data, reduces the number of outgoing connections, and is a recommended best practice. To configure it, refer to the batch block in the config reference.

Attributes manipulation

The Grafana Agent allows for general manipulation of attributes on spans that pass through this agent. A common use may be to add an environment or cluster variable. To configure it, refer to the attributes block in the config reference.

Attaching metadata with Prometheus Service Discovery

Prometheus Service Discovery mechanisms enable you to attach the same metadata to your traces as your metrics. For example, for Kubernetes users this means that you can dynamically attach metadata for namespace, pod, and name of the container sending spans.

traces:
  ...
  scrape_configs:
  - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    job_name: kubernetes-pods
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_namespace]
      target_label: namespace
    - source_labels: [__meta_kubernetes_pod_name]
      target_label: pod
    - source_labels: [__meta_kubernetes_pod_container_name]
      target_label: container
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: false

This feature isn’t just useful for Kubernetes users, however. All of Prometheus’ various service discovery mechanisms are supported here. This means you can use the same scrape_configs between your metrics, logs, and traces to get the same set of labels, and easily transition between your observability data when moving from your metrics, logs, and traces.

To configure it, refer to the scrape_configs block in the config reference.

Trace discovery through automatic logging

Automatic logging writes well formatted log lines to help with trace discovery.

For a closer look into the feature, visit Automatic logging.

Tail-based sampling

The Agent implements tail-based sampling for distributed tracing systems and multi-instance Agent deployments. With this feature, sampling decisions can be made based on data from a trace, rather than exclusively with probabilistic methods.

For a detailed description, go to Tail-based sampling.

Generating metrics from spans

The Agent can take advantage of the span data flowing through the pipeline to generate Prometheus metrics.

Go to Span metrics for a more detailed explanation of the feature.

Service graph metrics

Service graph metrics represent the relationships between services within a distributed system.

This service graphs processor builds a map of services by analysing traces, with the objective to find edges. Edges are spans with a parent-child relationship, that represent a jump (e.g. a request) between two services. The amount of requests and their duration are recorded as metrics, which are used to represent the graph.

To read more about this processor, go to its section

Exporting spans

The Grafana Agent can export traces to multiple different backends for every tracing pipeline. Exporting is built using OpenTelemetry Collector’s OTLP exporter. The Agent supports exporting tracing in OTLP format.

Aside from endpoint and authentication, the exporter also provides mechanisms for retrying on failure, and implements a queue buffering mechanism for transient failures, such as networking issues.

To see all available options, refer to the remote_write block in the config reference.