Enable tail-based sampling
Tempo provides an inexpensive solution that aims to reduce the amount of tail-based sampling required. However, sometimes constraints make a lower sampling percentage necessary or desirable, such as runtime or egress traffic related costs. Probabilistic sampling strategies are easy to implement, but also run the risk of discarding relevant data that you’ll later want.
Tail-based sampling works with Grafana Alloy. Alloy configuration files are written in Alloy configuration syntax.
How tail-based sampling works
In tail-based sampling, sampling decisions are made at the end of the workflow allowing for a more accurate sampling decision.
Alloy groups spans by trace ID and checks its data to see
if it meets one of the defined policies (for example, latency
or status_code
).
For instance, a policy can check if a trace contains an error or if it took
longer than a certain duration.
A trace is sampled if it meets at least one policy.
To group spans by trace ID, Alloy buffers spans for a configurable amount of time, after which it considers the trace complete. Longer running traces are split into more than one. However, waiting longer times increases the memory overhead of buffering.
One particular challenge of grouping trace data is for multi-instance Alloy deployments, where spans that belong to the same trace can arrive to different Alloys. To solve that, you can configure Alloy to load balance traces across Alloy instances by exporting spans belonging to the same trace to the same instance.
This is achieved by redistributing spans by trace ID once they arrive from the application. Alloy must be able to discover and connect to other Alloy instances where spans for the same trace can arrive. Kubernetes users should use a headless service.
Redistributing spans by trace ID means that spans are sent and received twice, which can cause a significant increase in CPU usage. This overhead increases with the number of Alloy instances that share the same traces.
Sampling load balancing
Tail sampling load balancing is usually carried out by running two layers of collectors, the first layer receiving the telemetry data (in this case trace spans), and then distributing these to the second layer that carry out the sampling policies.
Alloy includes a load balancing export that can carry out routing to further collector targets based on a set number of keys (in the case of trace sampling, the traceID
key).
Alloy uses the OpenTelemetry load balancing exporter.
The routing key ensures that a specific collector in the second layer always deals with spans from the same trace ID, ensuring that sampling decisions are made correctly. There is functionality to configure the exporter with targets via a few different methods. This includes static IPs, multi-IP DNS A record entries, and a Kubernetes headless service resolver. This has the advantage of allowing you to scale up/down the number of layer two collectors.
There are some important points to note with the load balancer exporter around scaling and resilience, mostly around its eventual consistency model. For more infortmation, refer to Resiliance and scaling considerations. The most important in terms of tail sampling is that routing occurs based on an algorithm taking into account the number of backends available to the load balancer, and this can affect the target for trace ID spans before eventual consistency occurs.
For an example manifest for a two layer OTel Collector deployment based around Kubernetes services, refer to the K8s resolver README.
Configure tail-based sampling
To start using tail-based sampling, define a sampling policy in your configuration file.
If you’re using a multi-instance deployment of Alloy, add load balancing and specify the resolving mechanism to find other Alloy instances in the setup.
To see all the available configuration options for load balancing, refer to the Alloy component reference.
Example for Alloy
Alloy uses the otelcol.processor.tail_sampling component
for tail-based sampling.
otelcol.receiver.otlp "default" {
http {}
grpc {}
output {
traces = [otelcol.processor.tail_sampling.policies.input]
}
}
// The Tail Sampling processor will use a set of policies to determine which received
// traces to keep and send to Tempo.
otelcol.processor.tail_sampling "policies" {
// Total wait time from the start of a trace before making a sampling decision.
// Note that smaller time periods can potentially cause a decision to be made
// before the end of a trace has occurred.
decision_wait = "30s"
// The following policies follow a logical OR pattern, meaning that if any of the
// policies match, the trace will be kept. For logical AND, you can use the `and`
// policy. Every span of a trace is examined by each policy in turn. A match will
// cause a short-circuit.
// This policy defines that traces that contain errors should be kept.
policy {
// The name of the policy can be used for logging purposes.
name = "sample-erroring-traces"
// The type must match the type of policy to be used, in this case examining
// the status code of every span in the trace.
type = "status_code"
// This block determines the error codes that should match in order to keep
// the trace, in this case the OpenTelemetry 'ERROR' code.
status_code {
status_codes = [ "ERROR" ]
}
}
// This policy defines that only traces that are longer than 200ms in total
// should be kept.
policy {
// The name of the policy can be used for logging purposes.
name = "sample-long-traces"
// The type must match the policy to be used, in this case the total latency
// of the trace.
type = "latency"
// This block determines the total length of the trace in milliseconds.
latency {
threshold_ms = 200
}
}
// The output block forwards the kept traces onto the batch processor, which
// will marshall them for exporting to the Grafana OTLP gateway.
output {
traces = [otelcol.exporter.otlp.default.input]
}
}
otelcol.exporter.otlp "default" {
client {
endpoint = env("OTLP_ENDPOINT")
}
}