Tail sampling for the OpenTelemetry Collector
Overview
With a tail sampling strategy, the decision to sample the trace is made considering all or most of the spans. For example, tail sampling is a good option to sample only traces that have errors or traces with long request duration.
Prerequisites
- Grafana Cloud metrics generation should be disabled. Contact Grafana Support if Grafana Cloud metrics generation is enabled on the stack.
- Head sampling should not already be implemented at the application level.
- Use The OpenTelemetry Collector to collect traces from the application, generate metrics from traces, and apply sampling.
- Send all traces to the collector to let the collector generate accurate metrics.
All the traces need to be sent because not all SDKs set the required trace state values.
Setup
The following OpenTelemetry collector components will be used for metrics generation and sampling:
The collector receives un-sampled traces, generates metrics, and sends metrics to Grafana Cloud Prometheus. In parallel, the collector applies a tail sampling strategy to the traces and sends sampled data to Grafana Cloud Tempo.
The configuration for the collector is similar to the configuration for head sampling strategy. The only difference is a tail sampling processor is used instead of a probabilistic sampler processor:
# Tested with OpenTelemetry Collector Contrib v0.88.0
receivers:
otlp:
# https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver
protocols:
grpc:
http:
hostmetrics:
# Optional. Host Metrics Receiver added as an example of Infra Monitoring capabilities of the OpenTelemetry Collector
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver
scrapers:
load:
memory:
processors:
batch:
# https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor
resourcedetection:
# Enriches telemetry data with resource information from the host
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
detectors: ["env", "system"]
override: false
transform/add_resource_attributes_as_metric_attributes:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(attributes["deployment.environment"], resource.attributes["deployment.environment"])
- set(attributes["service.version"], resource.attributes["service.version"])
filter/drop_unneeded_span_metrics:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/filterprocessor
error_mode: ignore
metrics:
datapoint:
- 'IsMatch(metric.name, "traces.spanmetrics.calls|traces.spanmetrics.duration") and IsMatch(attributes["span.kind"], "SPAN_KIND_INTERNAL|SPAN_KIND_CLIENT|SPAN_KIND_PRODUCER")'
transform/use_grafana_metric_names:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
error_mode: ignore
metric_statements:
- context: metric
statements:
- set(name, "traces.spanmetrics.latency") where name == "traces.spanmetrics.duration"
- set(name, "traces.spanmetrics.calls.total") where name == "traces.spanmetrics.calls"
tail_sampling:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor
policies:
# Examples: keep all traces that take more than 5000 ms
[
{
name: all_traces_above_5000ms,
type: latency,
latency: { threshold_ms: 5000 },
},
]
connectors:
servicegraph:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/servicegraphconnector
dimensions:
- service.namespace
- service.version
- deployment.environment
spanmetrics:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/connector/spanmetricsconnector
namespace: traces.spanmetrics
histogram:
unit: s
dimensions:
- name: service.namespace
- name: service.version
- name: deployment.environment
exporters:
otlp/grafana_cloud_traces:
# https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
endpoint: "${env:GRAFANA_CLOUD_TEMPO_ENDPOINT}"
auth:
authenticator: basicauth/grafana_cloud_traces
loki/grafana_cloud_logs:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/lokiexporter
endpoint: "${env:GRAFANA_CLOUD_LOKI_URL}"
auth:
authenticator: basicauth/grafana_cloud_logs
prometheusremotewrite/grafana_cloud_metrics:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusremotewriteexporter
endpoint: "${env:GRAFANA_CLOUD_PROMETHEUS_URL}"
auth:
authenticator: basicauth/grafana_cloud_metrics
add_metric_suffixes: false
extensions:
basicauth/grafana_cloud_traces:
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/basicauthextension
client_auth:
username: "${env:GRAFANA_CLOUD_TEMPO_USERNAME}"
password: "${env:GRAFANA_CLOUD_API_KEY}"
basicauth/grafana_cloud_metrics:
client_auth:
username: "${env:GRAFANA_CLOUD_PROMETHEUS_USERNAME}"
password: "${env:GRAFANA_CLOUD_API_KEY}"
basicauth/grafana_cloud_logs:
client_auth:
username: "${env:GRAFANA_CLOUD_LOKI_USERNAME}"
password: "${env:GRAFANA_CLOUD_API_KEY}"
service:
extensions:
[
basicauth/grafana_cloud_traces,
basicauth/grafana_cloud_metrics,
basicauth/grafana_cloud_logs,
]
pipelines:
traces:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [servicegraph, spanmetrics]
traces/grafana_cloud_traces:
receivers: [otlp]
processors: [resourcedetection, tail_sampling, batch]
exporters: [otlp/grafana_cloud_traces]
metrics:
receivers: [otlp, hostmetrics]
processors:
[
resourcedetection,
transform/add_resource_attributes_as_metric_attributes,
batch,
]
exporters: [prometheusremotewrite/grafana_cloud_metrics]
metrics/spanmetrics:
receivers: [spanmetrics]
processors:
[
filter/drop_unneeded_span_metrics,
transform/use_grafana_metric_names,
batch,
]
exporters: [prometheusremotewrite/grafana_cloud_metrics]
metrics/servicegraph:
receivers: [servicegraph]
processors: [batch]
exporters: [prometheusremotewrite/grafana_cloud_metrics]
logs:
receivers: [otlp]
processors: [resourcedetection, batch]
exporters: [loki/grafana_cloud_logs]
The instrumented application sends un-sampled traces to the collector via OTLP. The collector receives data and processes it with defined pipelines.
The traces
pipeline receives traces with the otlp
receiver and exports them to the servicegraph and spanmetrics connectors.
The resourcedetection processor here and in the further pipelines is used to enrich telemetry data with resource information from the host.
Consult the resource detection processor README.md for a list of configuration options.
The traces/grafana_cloud_traces
pipeline receives traces with the otlp
receiver, uses a tail sampling processor to sample traces, and exports them to the Grafana Cloud Tempo with the otlp exporter.
The default configuration of the tail sampling processor will hold traces in memory for 30s before sampling. Once a decision to sample has been made, spans that arrive after will be treated as new traces.
Consult the tail sampling processor README.md for a list of configuration options.
The metrics
pipeline receives traces from the otlp
receiver, and applies a transform processor to add deployment.environment
, and service.version
labels to metrics,
and exports metrics to the Grafana Cloud Metrics with the prometheusremotewrite exporter.
The hostmetrics receiver is optional. It is added here as an example of Infra Monitoring capabilities of the OpenTelemetry Collector
The metrics/spanmetrics
pipeline receives traces from the spanmetrics
connector, applies filter and transform processors,
and exports metrics to the Grafana Cloud Metrics with the prometheusremotewrite exporter.
The filter
processor is used to reduce cardinality by dropping metric data points that are not required for Application Observability.
The transform
processor is used to align metric names produced by the spanmetrics
connector with metrics produced in Tempo.
The metrics/servicegraph
pipeline receives traces from the servicegraph
connector, and exports metrics to the Grafana Cloud Metrics with the prometheusremotewrite exporter.
The logs
pipeline receives logs with the otlp
receiver and exports them to the Grafana Cloud Loki with the loki exporter.
All the pipelines use batch processor. Batching helps to better compress the data and reduce the number of outgoing connections required to transmit the data. It is also recommended to use
- memory limiter processor to prevent out of memory situations on the OpenTelemetry Collector
- health check extension to check the status of the OpenTelemetry Collector
Scaling
Scaling a tail sampling setup requires some thought and more planning, and typically involves the use of a load-balancing exporter.
Scaling the load-balancing exporter is easy, as an off-the-shelf layer 4 load-balancer would be sufficient.
With a Kubernetes setup, the OpenTelemetry Collector configuration for the load-balancer exporter could look like this:
receivers:
otlp:
protocols:
grpc:
processors:
exporters:
loadbalancing:
protocol:
otlp:
resolver:
static:
hostnames:
- otel-collector-1:4317
- otel-collector-2:4317
- otel-collector-3:4317
service:
pipelines:
traces:
receivers:
- otlp
exporters:
- loadbalancing
The collector has three resolvers for the load-balancing exporter static
, dns
, and k8s
.
- static: A static list of backends is provided in the configuration. This is suitable when the backends are static and scaling isn’t expected.
- dns: A hostname is provided as a parameter which the resolver periodically queries to discover IPs and update the load-balancer ring. When multiple instances are used, there is a chance they can momentarily have a different view of the system while they sync after a refresh. This can result in some spans for the same trace ID being sent to multiple hosts. Determine if this acceptable for the system, and use a longer refresh interval to reduce the effect of being out of sync.
- k8s: A new resolver that implements a watcher using Kubernetes APIs to get notifications when the list of pods backing a service is changed. This should reduce the amount of time when cluster views differ between nodes, effectively being a better solution than the DNS resolver when Kubernetes is used.
The static and dns resolvers are recommended for use in production environments. The k8s resolver is new and experimental.
Was this page helpful?
Related resources from Grafana Labs


