Menu
Grafana Cloud

Head sampling for the OpenTelemetry Collector

Overview

With a head sampling strategy, the decision to sample the trace is usually made as early as possible and does not need to take into account the whole trace - it is a simple but effective sampling strategy.

Head sampling can be implemented at the application level with SDKs, or at the collector level to keep the sampling configuration separate from the application.

Prerequisites

  1. Grafana Cloud metrics generation should be disabled. Contact Grafana Support if Grafana Cloud metrics generation is enabled on the stack.
  2. Head sampling should not already be implemented at the application level.
  3. Use The OpenTelemetry Collector to collect traces from the application, generate metrics from traces, and apply sampling.
  4. Send all traces to the collector to let the collector generate accurate metrics.

Setup

The following OpenTelemetry Collector components will be used for metrics generation and sampling:

The collector receives un-sampled traces, generates metrics, and sends metrics to Grafana Cloud Prometheus. In parallel, the collector applies a probabilistic sampling strategy to the traces and sends sampled data to Grafana Cloud Tempo.

The configuration for the collector is similar to the configuration for tail sampling strategy. The only difference is a probabilistic sampler processor is used instead of a tail sampling processor:

yaml
# Tested with OpenTelemetry Collector Contrib v0.88.0
receivers:
  otlp:
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
    protocols:
      grpc:
      http:
  hostmetrics:
    # Optional. Host Metrics Receiver added as an example of Infra Monitoring capabilities of the OpenTelemetry Collector
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver
    scrapers:
      load:
      memory:

processors:
  batch:
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
  resourcedetection:
    # Enriches telemetry data with resource information from the host
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
    detectors: ["env", "system"]
    override: false
  transform/add_resource_attributes_as_metric_attributes:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
    error_mode: ignore
    metric_statements:
      - context: datapoint
        statements:
          - set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
          - set(attributes["service.version"], resource.attributes["service.version"])
  filter/drop_unneeded_span_metrics:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/filterprocessor
    error_mode: ignore
    metrics:
      datapoint:
        - 'IsMatch(metric.name, "traces.spanmetrics.calls|traces.spanmetrics.duration") and IsMatch(attributes["span.kind"], "SPAN_KIND_INTERNAL|SPAN_KIND_CLIENT|SPAN_KIND_PRODUCER")'
  transform/use_grafana_metric_names:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
    error_mode: ignore
    metric_statements:
      - context: metric
        statements:
          - set(name, "traces.spanmetrics.latency") where name == "traces.spanmetrics.duration"
          - set(name, "traces.spanmetrics.calls.total") where name == "traces.spanmetrics.calls"
  probabilistic_sampler:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/probabilisticsamplerprocessor
    # Examples: keep 10% of traces
    sampling_percentage: 10

connectors:
  servicegraph:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/servicegraphconnector
    dimensions:
      - service.namespace
      - service.version
      - deployment.environment

  spanmetrics:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector
    namespace: traces.spanmetrics
    histogram:
      unit: s
    dimensions:
      - name: service.namespace
      - name: service.version
      - name: deployment.environment

exporters:
  otlp/grafana_cloud_traces:
    # https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
    endpoint: "${env:GRAFANA_CLOUD_TEMPO_ENDPOINT}"
    auth:
      authenticator: basicauth/grafana_cloud_traces

  loki/grafana_cloud_logs:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/lokiexporter
    endpoint: "${env:GRAFANA_CLOUD_LOKI_URL}"
    auth:
      authenticator: basicauth/grafana_cloud_logs

  prometheusremotewrite/grafana_cloud_metrics:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusremotewriteexporter
    endpoint: "${env:GRAFANA_CLOUD_PROMETHEUS_URL}"
    auth:
      authenticator: basicauth/grafana_cloud_metrics
    add_metric_suffixes: false

extensions:
  basicauth/grafana_cloud_traces:
    # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/basicauthextension
    client_auth:
      username: "${env:GRAFANA_CLOUD_TEMPO_USERNAME}"
      password: "${env:GRAFANA_CLOUD_API_KEY}"
  basicauth/grafana_cloud_metrics:
    client_auth:
      username: "${env:GRAFANA_CLOUD_PROMETHEUS_USERNAME}"
      password: "${env:GRAFANA_CLOUD_API_KEY}"
  basicauth/grafana_cloud_logs:
    client_auth:
      username: "${env:GRAFANA_CLOUD_LOKI_USERNAME}"
      password: "${env:GRAFANA_CLOUD_API_KEY}"

service:
  extensions:
    [
      basicauth/grafana_cloud_traces,
      basicauth/grafana_cloud_metrics,
      basicauth/grafana_cloud_logs,
    ]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [servicegraph, spanmetrics]
    traces/grafana_cloud_traces:
      receivers: [otlp]
      processors: [resourcedetection, probabilistic_sampler, batch]
      exporters: [otlp/grafana_cloud_traces]
    metrics:
      receivers: [otlp, hostmetrics]
      processors:
        [
          resourcedetection,
          transform/add_resource_attributes_as_metric_attributes,
          batch,
        ]
      exporters: [prometheusremotewrite/grafana_cloud_metrics]
    metrics/spanmetrics:
      receivers: [spanmetrics]
      processors:
        [
          filter/drop_unneeded_span_metrics,
          transform/use_grafana_metric_names,
          batch,
        ]
      exporters: [prometheusremotewrite/grafana_cloud_metrics]
    metrics/servicegraph:
      receivers: [servicegraph]
      processors: [batch]
      exporters: [prometheusremotewrite/grafana_cloud_metrics]
    logs:
      receivers: [otlp]
      processors: [resourcedetection, batch]
      exporters: [loki/grafana_cloud_logs]

The instrumented application sends un-sampled traces to the collector via OTLP. The collector receives data and processes it with defined pipelines.

The traces pipeline receives traces with the otlp receiver and exports them to the servicegraph and spanmetrics connectors. The resourcedetection processor here and in the further pipelines is used to enrich telemetry data with resource information from the host.

Consult the resource detection processor README.md for a list of configuration options.

The traces/grafana_cloud_traces pipeline receives traces with the otlp receiver, uses a probabilistic_sampler processor to sample traces, and exports them to the Grafana Cloud Tempo with the otlp exporter.

The metrics pipeline receives traces from the otlp receiver, and applies a transform processor to add deployment.environment, and service.version labels to metrics, and exports metrics to the Grafana Cloud Metrics with the prometheusremotewrite exporter. The hostmetrics receiver is optional. It is added here as an example of Infra Monitoring capabilities of the OpenTelemetry Collector

The metrics/spanmetrics pipeline receives traces from the spanmetrics connector, applies filter and transform processors, and exports metrics to the Grafana Cloud Metrics with the prometheusremotewrite exporter. The filter processor is used to reduce cardinality by dropping metric data points that are not required for Application Observability. The transform processor is used to align metric names produced by the spanmetrics connector with metrics produced in Tempo.

The metrics/servicegraph pipeline receives traces from the servicegraph connector, and exports metrics to the Grafana Cloud Metrics with the prometheusremotewrite exporter.

The logs pipeline receives logs with the otlp receiver and exports them to the Grafana Cloud Loki with the loki exporter.

All the pipelines use batch processor. Batching helps to better compress the data and reduce the number of outgoing connections required to transmit the data. It is also recommended to use

  • memory limiter processor to prevent out of memory situations on the OpenTelemetry Collector
  • health check extension to check the status of the OpenTelemetry Collector

Scaling

Consult the scaling stateless Collectors guide to learn how to scale the OpenTelemetry Collector head sampling architecture.