Menu
Documentation Grafana Cloud Instrument and send data Traces Metrics-generator in Grafana Cloud Traces
Grafana Cloud

Metrics-generator in Grafana Cloud Traces

The Tempo metrics-generator can derive metrics from traces as they are ingested. When used in Grafana Cloud, the metrics-generator writes metrics directly to the hosted Prometheus instance in the same stack.

Metrics-generator in Grafana Cloud Traces architecture.
Metrics-generator in Grafana Cloud Traces architecture.

For more information about the metrics-generator and the metrics it creates, see Grafana Tempo | Metrics-generator. This document describes the Grafana Cloud-specific capabilities.

Note

Metrics generation is disabled by default. Contact Grafana Support to enable metrics generation for your organization!

Constraints and good to know

  • The active series sent to the hosted Prometheus instance is billed like regular metrics.
  • Metrics can only be sent to a hosted Prometheus instance in the same region.
  • If traces are down-sampled before reaching Tempo, the metrics will be lower than reality.
  • All generated metrics are aggregated by default.

Aggregated metrics

Grafana Cloud uses Adaptive Metrics to aggregate away operational labels added by the open source Tempo metrics generator. This reduces the number of time series produced by the metrics generator, and therefore reduces the cost of enabling metrics generation for Grafana Cloud users.

In most cases, this aggregation should be completely unnoticeable to users.

There are some notable points to take into account:

  • Both the trace_spanmetrics_* and trace_service_graph_* metric families are aggregated.
  • The label that is aggregated away is the __metrics_gen_instance label. The aggregation function used is sum:counter.
  • PromQL queries to metrics generated from traces must follow the same rules as queries to any aggregated metric. For more information, see below.
  • The metrics are produced at a resolution consistent with the resolution you’ve purchased. If you are on a 1 data-point-per-minute plan, the metrics-generator metrics have 1 data point per minute.

For PromQL, form queries that take into account the aggregation. For example, this query isn’t valid:

traces_spanmetrics_calls_total

If you run this query, Grafana returns the following error message:

Invalid PromQL query for <code>traces_spanmetrics_calls_total</code>

Instead, rewrite the query assuming aggregation, such as the sum of rate over time:

sum(rate(traces_spanmetrics_calls_total[4m]))

Corrected example of a PromQL aggregated metrics query

Queries such as the following are also invalid because they assume a return of two instance vectors prior to creating a range vector:

sum by (service, span_name)(rate(traces_spanmetrics_calls_total{status_code="STATUS_CODE_ERROR"}[4m]) / rate(rate(traces_spanmetrics_calls_total{status_code="STATUS_CODE_ERROR"}[4m])))

However, you can modify these queries to explicitly use aggregated metrics by ensuring two range vectors are used instead:

(sum by (service, span_name)(rate(traces_spanmetrics_calls_total{status_code="STATUS_CODE_ERROR"}[4m])) / (sum by (service, span_name)(rate(traces_spanmetrics_calls_total{status_code!=""}[4m]))))

Refer to Troubleshoot your aggregated metrics query for more help on how to query aggregated metrics.

Finally, if you require the unaggregated metrics generated by Grafana Cloud Traces, contact Grafana Support for help removing the aggregation rules from Adaptive Metrics.

Monitor the metrics-generator

The grafanacloud-usage data source exposes several metrics about the metrics-generator.

Amount of active series:

grafanacloud_traces_instance_metrics_generator_active_series{}

Amount of active series being limited:

grafanacloud_traces_instance_metrics_generator_series_dropped_per_second{}

Amount of spans that are discarded by the metrics-generator before the spans are processed:

grafanacloud_traces_instance_metrics_generator_discarded_spans_per_second

This metric has a reason label:

  • outside_metrics_ingestion_slack: The time between the creation of the span and when it was ingested was too large and the span is deemed outdated. Processing this span and including it a current metrics sample would skew the data.

How this works

When the amount of active series in Tempo reaches a configurable limit, no new active series are added. Grafana Cloud Traces keeps updating the existing series. The series exceeding the limit are dropped.

Configuration options

You can configure the following settings for metrics-generator in Grafana Cloud Traces. Contact Grafana Support to modify any of these settings.

ConfigurationDescription
Enabled processorThe metrics processors to enable; options include service graphs and/or span metrics.
Max active seriesThe maximum amount of active series.
Collection intervalHow often samples are collected from the active series. Defaults to every 60s or 1 DPM.
Histogram bucketsThe buckets used for the histograms generated by the metrics-generator. This can be configured per processor.
DimensionsAdditional dimensions to be added to the generated metrics. If this dimension is present in the span attributes, it’s included as a label in the metrics. This can be configured per processor.

Note

Characters that aren’t valid Prometheus labels are sanitized. For example, the trace attribute k8s.namespace becomes the Prometheus label k8s_namespace.