Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Metrics-generator

Span metrics

Open source

Span metrics

The span metrics processor generates metrics from ingested tracing data, including request, error, and duration (RED) metrics.

Span metrics generate two metrics:

A counter that computes requests
A histogram that tracks the distribution of durations of all requests

Span metrics are of particular interest if your system is not monitored with metrics, but it has distributed tracing implemented. You get out-of-the-box metrics from your tracing pipeline.

Note
Metrics generation is disabled by default. Contact Grafana Support to enable metrics generation in your organization.

Even if you already have metrics, span metrics can provide in-depth monitoring of your system. The generated metrics will show application level insight into your monitoring, as far as tracing gets propagated through your applications.

Last but not least, span metrics lower the entry barrier for using exemplars. An exemplar is a specific trace representative of measurement taken in a given time interval. Since traces and metrics co-exist in the metrics-generator, exemplars can be automatically added, providing additional value to these metrics.

How to run

To enable service graphs in Tempo/GET, enable the metrics generator and add an overrides section which enables the span-metrics generator. See here for configuration details.

How it works

The span metrics processor works by inspecting every received span and computing the total count and the duration of spans for every unique combination of dimensions. Dimensions can be the service name, the operation, the span kind, the status code and any attribute present in the span.

This processor is designed with the goal to mirror the implementation from the OpenTelemetry Collector of the processor with the same name.

Note
To learn more about cardinality and how to perform a dry run of the metrics generator, see the Cardinality documentation.

Metrics

The following metrics are exported:

Metric	Type	Labels	Description
traces_spanmetrics_latency	Histogram	Dimensions	Duration of the span
traces_spanmetrics_calls_total	Counter	Dimensions	Total count of the span
traces_spanmetrics_size_total	Counter	Dimensions	Total size of spans ingested

Note
In Tempo 1.4 and 1.4.1, the histogram metric was called traces_spanmetrics_duration_seconds. This was changed later to be consistent with the metrics generated by the Grafana Agent and the OpenTelemetry Collector.

By default, the metrics processor adds the following labels to each metric: service, span_name, span_kind, status_code, status_message, job, and instance.

service - The name of the service that generated the span
span_name - The unique name of the span
span_kind - The type of span, this can be one of five values:
- SPAN_KIND_SERVER - The span was generated by a call from another service
- SPAN_KIND_CLIENT - The span made a call to another service
- SPAN_KIND_INTERNAL - The span does not have interaction outside of the service it was generated in
- SPAN_KIND_PUBLISHER - The span created data that was pushed onto a bus or message broker
- SPAN_KIND_CONSUMER - The span consumed data that was on a bus or messaging system
status_code - The result of the span, this can be one of three values:
- STATUS_CODE_UNSET - Result of the span was unset/unknown
- STATUS_CODE_OK - The span operation completed successfully
- STATUS_CODE_ERROR - The span operation completed with an error
status_message (optionally enabled) - The message that details the reason for the status_code label
job - The name of the job, a combination of namespace and service; only added if metrics_generator.processor.span_metrics.enable_target_info: true
instance - The instance ID; only added if metrics_generator.processor.span_metrics.enable_target_info: true

Additional user defined labels can be created using the dimensions configuration option. When a configured dimension collides with one of the default labels (e.g. status_code), the label for the respective dimension is prefixed with double underscore (i.e. __status_code).

Custom labeling of dimensions is also supported using the dimension_mapping configuration option.

An optional metric called traces_target_info using all resource level attributes as dimensions can be enabled in the enable_target_info configuration option.

If you use a ratio-based sampler, you can use the custom sampler below to not lose metric information. However, you also need to set metrics_generator.processor.span_metrics.span_multiplier_key to "X-SampleRatio".

package tracer
import (
	"go.opentelemetry.io/otel/attribute"
	tracesdk "go.opentelemetry.io/otel/sdk/trace"
)

type RatioBasedSampler struct {
	innerSampler        tracesdk.Sampler
	sampleRateAttribute attribute.KeyValue
}

func NewRatioBasedSampler(fraction float64) RatioBasedSampler {
	innerSampler := tracesdk.TraceIDRatioBased(fraction)
	return RatioBasedSampler{
		innerSampler:        innerSampler,
		sampleRateAttribute: attribute.Float64("X-SampleRatio", fraction),
	}
}

func (ds RatioBasedSampler) ShouldSample(parameters tracesdk.SamplingParameters) tracesdk.SamplingResult {
	sampler := ds.innerSampler
	result := sampler.ShouldSample(parameters)
	if result.Decision == tracesdk.RecordAndSample {
		result.Attributes = append(result.Attributes, ds.sampleRateAttribute)
	}
	return result
}

func (ds RatioBasedSampler) Description() string {
	return "Ratio Based Sampler which gives information about sampling ratio"
}

Filtering

In some cases, you may want to reduce the number of metrics produced by the spanmetrics processor. You can configure the processor to use an include filter to match criteria that must be present in the span in order to be included. Following the include filter, you can use an exclude filter to reject portions of what was previously included by the filter policy.

Currently, only filtering by resource and span attributes with the following value types is supported.

bool
double
int
string

Additionally, these intrinsic span attributes may be filtered upon:

name
status (code)
kind

The following intrinsic kinds are available for filtering.

SPAN_KIND_SERVER
SPAN_KIND_INTERNAL
SPAN_KIND_CLIENT
SPAN_KIND_PRODUCER
SPAN_KIND_CONSUMER

Intrinsic keys can be acted on directly when implementing a filter policy. For example:

---
metrics_generator:
  processor:
    span_metrics:
      filter_policies:
        - include:
            match_type: strict
            attributes:
              - key: kind
                value: SPAN_KIND_SERVER

In this example, spans which are of kind “server” are included for metrics export.

When selecting spans based on non-intrinsic attributes, it is required to specify the scope of the attribute, similar to how it is specified in TraceQL. For example, if the resource contains a location attribute which is to be used in a filter policy, then the reference needs to be specified as resource.location. This requires users to know and specify which scope an attribute is to be found and avoids the ambiguity of conflicting values at differing scopes. The following may help illustrate.

---
metrics_generator:
  processor:
    span_metrics:
      filter_policies:
        - include:
            match_type: strict
            attributes:
              - key: resource.location
                value: earth

In the above examples, we are using match_type of strict, which is a direct comparison of values. You can use regex, an additional option for match_type, to build a regular expression to match against.

---
metrics_generator:
  processor:
    span_metrics:
      filter_policies:
        - include:
            match_type: regex
            attributes:
              - key: resource.location
                value: eu-.*
        - exclude:
            match_type: regex
            attributes:
              - key: resource.tier
                value: dev-.*

In the above, we first include all spans which have a resource.location that begins with eu- with the include statement, and then exclude those with begin with dev-. In this way, a flexible approach to filtering can be achieved to ensure that only metrics which are important are generated.