<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Troubleshoot Tempo on Grafana Labs</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/</link><description>Recent content in Troubleshoot Tempo on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/tempo/v3.0.x/troubleshooting/index.xml" rel="self" type="application/rss+xml"/><item><title>Issues with sending traces</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/send-traces/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/send-traces/</guid><content><![CDATA[&lt;h1 id=&#34;issues-with-sending-traces&#34;&gt;Issues with sending traces&lt;/h1&gt;
&lt;p&gt;Learn about issues related to sending traces.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/send-traces/max-trace-limit-reached/&#34;&gt;Distributor refusing spans&lt;/a&gt;&lt;br&gt;Troubleshoot distributor refusing spans&lt;/li&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/send-traces/alloy/&#34;&gt;Troubleshoot Grafana Alloy&lt;/a&gt;&lt;br&gt;Gain visibility on how many traces are being pushed to Grafana Alloy and if they are making it to the Tempo backend.&lt;/li&gt;&lt;/ul&gt;
]]></content><description>&lt;h1 id="issues-with-sending-traces">Issues with sending traces&lt;/h1>
&lt;p>Learn about issues related to sending traces.&lt;/p>
&lt;ul>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/send-traces/max-trace-limit-reached/">Distributor refusing spans&lt;/a>&lt;br>Troubleshoot distributor refusing spans&lt;/li>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/send-traces/alloy/">Troubleshoot Grafana Alloy&lt;/a>&lt;br>Gain visibility on how many traces are being pushed to Grafana Alloy and if they are making it to the Tempo backend.&lt;/li>&lt;/ul></description></item><item><title>Issues with querying</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/</guid><content><![CDATA[&lt;h1 id=&#34;issues-with-querying&#34;&gt;Issues with querying&lt;/h1&gt;
&lt;p&gt;Learn about issues related to querying.&lt;/p&gt;
&lt;ul&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/querying/unable-to-see-trace/&#34;&gt;Unable to find traces&lt;/a&gt;&lt;br&gt;Troubleshoot missing traces&lt;/li&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/querying/too-many-jobs-in-queue/&#34;&gt;Too many jobs in the queue&lt;/a&gt;&lt;br&gt;Troubleshoot too many jobs in the queue&lt;/li&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/querying/bad-blocks/&#34;&gt;Bad blocks&lt;/a&gt;&lt;br&gt;Troubleshoot queries failing with an error message indicating bad blocks.&lt;/li&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/querying/search-tag/&#34;&gt;Tag search&lt;/a&gt;&lt;br&gt;Troubleshoot No options found in Grafana tag search&lt;/li&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/querying/response-too-large/&#34;&gt;Response larger than the max&lt;/a&gt;&lt;br&gt;Troubleshoot response larger than the max error message&lt;/li&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/querying/long-running-traces/&#34;&gt;Long-running traces&lt;/a&gt;&lt;br&gt;Troubleshoot search results when using long-running traces&lt;/li&gt;&lt;li&gt;
    &lt;a href=&#34;/docs/tempo/v3.0.x/troubleshooting/querying/too-many-requests-error/&#34;&gt;Too many requests error&lt;/a&gt;&lt;br&gt;Troubleshoot Too many requests error for a Tempo query&lt;/li&gt;&lt;/ul&gt;
]]></content><description>&lt;h1 id="issues-with-querying">Issues with querying&lt;/h1>
&lt;p>Learn about issues related to querying.&lt;/p>
&lt;ul>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/querying/unable-to-see-trace/">Unable to find traces&lt;/a>&lt;br>Troubleshoot missing traces&lt;/li>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/querying/too-many-jobs-in-queue/">Too many jobs in the queue&lt;/a>&lt;br>Troubleshoot too many jobs in the queue&lt;/li>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/querying/bad-blocks/">Bad blocks&lt;/a>&lt;br>Troubleshoot queries failing with an error message indicating bad blocks.&lt;/li>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/querying/search-tag/">Tag search&lt;/a>&lt;br>Troubleshoot No options found in Grafana tag search&lt;/li>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/querying/response-too-large/">Response larger than the max&lt;/a>&lt;br>Troubleshoot response larger than the max error message&lt;/li>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/querying/long-running-traces/">Long-running traces&lt;/a>&lt;br>Troubleshoot search results when using long-running traces&lt;/li>&lt;li>
&lt;a href="/docs/tempo/v3.0.x/troubleshooting/querying/too-many-requests-error/">Too many requests error&lt;/a>&lt;br>Troubleshoot Too many requests error for a Tempo query&lt;/li>&lt;/ul></description></item><item><title>Troubleshoot metrics-generator</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/metrics-generator/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/metrics-generator/</guid><content><![CDATA[&lt;h1 id=&#34;troubleshoot-metrics-generator&#34;&gt;Troubleshoot metrics-generator&lt;/h1&gt;
&lt;p&gt;If you&amp;rsquo;re concerned with data quality issues in the metrics-generator, consider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reviewing your telemetry pipeline to determine the number of dropped spans. You are only looking for major issues here.&lt;/li&gt;
&lt;li&gt;Reviewing the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/metrics-generator/service_graphs/&#34;&gt;service graph documentation&lt;/a&gt; to understand how they are built.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If everything seems acceptable from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.&lt;/p&gt;
&lt;h2 id=&#34;kafka-consumption&#34;&gt;Kafka consumption&lt;/h2&gt;
&lt;p&gt;In Tempo 3.0 microservices mode, metrics-generators consume trace data directly from Kafka rather than receiving pushes from distributors. In monolithic mode, the distributor still pushes directly to the in-process metrics-generator. If the generator is not producing metrics in a microservices deployment, start by verifying that it&amp;rsquo;s consuming data from Kafka using the metrics below.&lt;/p&gt;
&lt;h3 id=&#34;consumer-lag&#34;&gt;Consumer lag&lt;/h3&gt;
&lt;p&gt;Use the following metrics to monitor the generator&amp;rsquo;s Kafka consumer lag:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;tempo_ingest_group_partition_lag{group=&amp;#34;metrics-generator&amp;#34;}
tempo_ingest_group_partition_lag_seconds{group=&amp;#34;metrics-generator&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;code&gt;tempo_ingest_group_partition_lag&lt;/code&gt; tracks the lag in number of records per partition, while &lt;code&gt;tempo_ingest_group_partition_lag_seconds&lt;/code&gt; tracks the lag in seconds. High or growing lag indicates the generator is falling behind.&lt;/p&gt;
&lt;h3 id=&#34;kafka-client-errors&#34;&gt;Kafka client errors&lt;/h3&gt;
&lt;p&gt;The generator uses the &lt;code&gt;tempo_ingest_storage_reader&lt;/code&gt; family of metrics (provided by the Kafka client library) to expose detailed information about fetch operations, errors, and throughput. Look for error and failure metrics in this family to diagnose connectivity or protocol issues with Kafka.&lt;/p&gt;
&lt;h2 id=&#34;all-metrics&#34;&gt;All metrics&lt;/h2&gt;
&lt;p&gt;This section covers additional metrics related to the metrics-generator.&lt;/p&gt;
&lt;h3 id=&#34;discarded-spans-in-the-generator&#34;&gt;Discarded spans in the generator&lt;/h3&gt;
&lt;p&gt;Spans are rejected from being considered by the metrics-generator by a configurable slack time as well as due to user
configurable filters. You can see the number of spans rejected by reason using this metric:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_spans_discarded_total{}[1m])) by (reason)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If a lot of spans are dropped in the metrics-generator due to your filters, you will need to adjust them. If spans are dropped
due to the ingestion slack time, consider adjusting this setting:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;metrics_generator:
  metrics_ingestion_time_range_slack: 30s&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If spans are regularly exceeding this value you may want to consider reviewing your tracing pipeline to see if you have excessive buffering.
Note that increasing this value allows the generator to consume more spans, but does reduce the accuracy of metrics because spans farther
away from &amp;ldquo;now&amp;rdquo; are included.&lt;/p&gt;
&lt;p&gt;Spans could also be discarded if the attributes aren&amp;rsquo;t valid UTF-8 characters when those attributes are converted to metric labels.&lt;/p&gt;
&lt;h3 id=&#34;max-active-series&#34;&gt;Max active series&lt;/h3&gt;
&lt;p&gt;The generator protects itself and your remote-write target by having a maximum number of series the generator produces.
Use the &lt;code&gt;sum&lt;/code&gt; below to determine if series are being dropped due to this limit:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_registry_series_limited_total{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Use the following setting to update the limit:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  defaults:
    metrics_generator:
      max_active_series: 0&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Note that this value is per metrics generator. The actual max series remote written will be &lt;code&gt;&amp;lt;# of metrics generators&amp;gt; * &amp;lt;metrics_generator.max_active_series&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;overflow-series&#34;&gt;Overflow series&lt;/h3&gt;
&lt;p&gt;When the active series limit is reached, the metrics-generator produces overflow series instead of dropping new data. These series have the label &lt;code&gt;metric_overflow=&amp;quot;true&amp;quot;&lt;/code&gt; and capture all data that would otherwise be lost.&lt;/p&gt;
&lt;p&gt;To identify overflow series in your metrics:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;{metric_overflow=&amp;#34;true&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;As existing series become stale and are removed, new series are split out from the overflow bucket until the limit is reached again. To reduce overflow, either increase &lt;code&gt;max_active_series&lt;/code&gt; or reduce cardinality by adjusting dimensions or filters.&lt;/p&gt;
&lt;h3 id=&#34;entity-based-limiting&#34;&gt;Entity-based limiting&lt;/h3&gt;
&lt;p&gt;You can configure entity-based limiting as an alternative to series-based limiting.
An entity is a unique label combination (excluding external labels) across multiple metrics.
Entity-based limiting ensures the generator always produces the full set of metrics for a given entity, rather than limiting randomly once the series limit is triggered.&lt;/p&gt;
&lt;p&gt;To enable entity-based limiting, set &lt;code&gt;limiter_type&lt;/code&gt; to &lt;code&gt;entity&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;metrics_generator:
  limiter_type: entity&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Use the following metric to determine if entities are being limited:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_registry_entities_limited_total{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Configure the entity limit with:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  defaults:
    metrics_generator:
      max_active_entities: 0&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;per-label-cardinality-limiting&#34;&gt;Per-label cardinality limiting&lt;/h3&gt;
&lt;p&gt;The per-label cardinality limiter caps the number of distinct values any single label can have. When a label exceeds the configured threshold, its value is replaced with &lt;code&gt;__cardinality_overflow__&lt;/code&gt; while all other labels that are under the limit are preserved.&lt;/p&gt;
&lt;p&gt;For example, if the &lt;code&gt;url&lt;/code&gt; label exceeds the cardinality limit:&lt;/p&gt;
&lt;p&gt;Before:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;{service=&amp;#34;foo&amp;#34;, method=&amp;#34;GET&amp;#34;, url=&amp;#34;/users/1&amp;#34;}
{service=&amp;#34;foo&amp;#34;, method=&amp;#34;GET&amp;#34;, url=&amp;#34;/users/2&amp;#34;}
{service=&amp;#34;foo&amp;#34;, method=&amp;#34;GET&amp;#34;, url=&amp;#34;/users/3&amp;#34;}
...&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;After:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;{service=&amp;#34;foo&amp;#34;, method=&amp;#34;GET&amp;#34;, url=&amp;#34;__cardinality_overflow__&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Once the limiter kicks in, new &lt;code&gt;url&lt;/code&gt; values are replaced with &lt;code&gt;__cardinality_overflow__&lt;/code&gt;. Labels that remain under the limit, like &lt;code&gt;method&lt;/code&gt;, are unaffected.&lt;/p&gt;
&lt;p&gt;To detect if per-label cardinality limiting is active:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;sum by (tenant, label_name) (rate(tempo_metrics_generator_registry_label_values_limited_total{}[5m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;To view the estimated cardinality demand per label:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;tempo_metrics_generator_registry_label_cardinality_demand_estimate{}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Use this metric to identify which labels have high cardinality, how far they exceed the configured limit, and to choose an appropriate
&lt;code&gt;max_cardinality_per_label&lt;/code&gt; value. To observe actual demand before enforcing a limit, deploy with a high &lt;code&gt;max_cardinality_per_label&lt;/code&gt; value first.&lt;/p&gt;
&lt;h4 id=&#34;understand-the-label_name-values-in-this-metric&#34;&gt;Understand the &lt;code&gt;label_name&lt;/code&gt; values in this metric&lt;/h4&gt;
&lt;p&gt;The &lt;code&gt;label_name&lt;/code&gt; label values represent every label tracked by the per-label cardinality limiter.
These include all labels that flow through the metrics-generator registry, not just user-configured dimensions.&lt;/p&gt;
&lt;p&gt;Built-in labels:&lt;/p&gt;
&lt;section class=&#34;expand-table-wrapper&#34;&gt;&lt;div class=&#34;button-div&#34;&gt;
      &lt;button class=&#34;expand-table-btn&#34;&gt;Expand table&lt;/button&gt;
    &lt;/div&gt;&lt;div class=&#34;responsive-table-wrapper&#34;&gt;
    &lt;table&gt;
      &lt;thead&gt;
          &lt;tr&gt;
              &lt;th&gt;Label&lt;/th&gt;
              &lt;th&gt;Processor&lt;/th&gt;
              &lt;th&gt;When added&lt;/th&gt;
              &lt;th&gt;Description&lt;/th&gt;
          &lt;/tr&gt;
      &lt;/thead&gt;
      &lt;tbody&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;service&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;span-metrics&lt;/td&gt;
              &lt;td&gt;Always&lt;/td&gt;
              &lt;td&gt;The service name&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;span_name&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;span-metrics&lt;/td&gt;
              &lt;td&gt;Always&lt;/td&gt;
              &lt;td&gt;The operation or span name&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;span_kind&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;span-metrics&lt;/td&gt;
              &lt;td&gt;Always&lt;/td&gt;
              &lt;td&gt;The span kind (SERVER, CLIENT, etc.)&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;status_code&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;span-metrics&lt;/td&gt;
              &lt;td&gt;Always&lt;/td&gt;
              &lt;td&gt;The span status code&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;job&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;span-metrics&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;enable_target_info&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;The job name, derived from resource attributes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;instance&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;span-metrics&lt;/td&gt;
              &lt;td&gt;&lt;code&gt;enable_target_info&lt;/code&gt; and &lt;code&gt;enable_instance_label&lt;/code&gt; are both &lt;code&gt;true&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;The instance ID, derived from resource attributes&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;client&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;service-graphs&lt;/td&gt;
              &lt;td&gt;Always&lt;/td&gt;
              &lt;td&gt;The client service name&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;server&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;service-graphs&lt;/td&gt;
              &lt;td&gt;Always&lt;/td&gt;
              &lt;td&gt;The server service name&lt;/td&gt;
          &lt;/tr&gt;
          &lt;tr&gt;
              &lt;td&gt;&lt;code&gt;connection_type&lt;/code&gt;&lt;/td&gt;
              &lt;td&gt;service-graphs&lt;/td&gt;
              &lt;td&gt;Always&lt;/td&gt;
              &lt;td&gt;The connection type (virtual, database, messaging_system)&lt;/td&gt;
          &lt;/tr&gt;
      &lt;/tbody&gt;
    &lt;/table&gt;
  &lt;/div&gt;
&lt;/section&gt;&lt;p&gt;Configured labels include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Span-metrics dimensions are added as-is.
For example, &lt;code&gt;deployment.environment&lt;/code&gt; becomes &lt;code&gt;deployment_environment&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Service-graphs dimensions are prefixed with &lt;code&gt;client_&lt;/code&gt; and &lt;code&gt;server_&lt;/code&gt; when &lt;code&gt;enable_client_server_prefix&lt;/code&gt; is &lt;code&gt;true&lt;/code&gt;.
For example, &lt;code&gt;deployment.environment&lt;/code&gt; becomes &lt;code&gt;client_deployment_environment&lt;/code&gt; and &lt;code&gt;server_deployment_environment&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;A configured dimension only appears if the corresponding attribute exists on incoming spans.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Configure the per-label cardinality limit:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  defaults:
    metrics_generator:
      max_cardinality_per_label: 0&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;A value of &lt;code&gt;0&lt;/code&gt; (default) disables the limit.&lt;/p&gt;
&lt;p&gt;This setting works alongside both active series limiting (&lt;code&gt;max_active_series&lt;/code&gt;) and entity-based limiting (&lt;code&gt;max_active_entities&lt;/code&gt;).
The per-label limiter runs during label construction, preventing any single high-cardinality label from consuming the entire active series or entity budget.&lt;/p&gt;
&lt;p&gt;The per-label limiter uses HyperLogLog sketches to estimate cardinality, so the limit is approximate with a 3.25% standard error. Estimates are
re-evaluated every few seconds, which means there may be a brief delay between a label crossing the threshold and the limiter taking effect.&lt;/p&gt;
&lt;p&gt;If a high-cardinality label&amp;rsquo;s cardinality is later reduced (for example, by fixing instrumentation), the limiter automatically recovers
and allows label values through again. No configuration changes are needed.&lt;/p&gt;
&lt;p&gt;Recovery is not immediate. The limiter tracks cardinality over a sliding window (based on the registry&amp;rsquo;s &lt;code&gt;stale_duration&lt;/code&gt;). It takes at least that
duration or longer for existing high-cardinality labels to age out before the label values are allowed through again.&lt;/p&gt;
&lt;h3 id=&#34;estimate-active-series-demand&#34;&gt;Estimate active series demand&lt;/h3&gt;
&lt;p&gt;When the active series limit is reached, the &lt;code&gt;tempo_metrics_generator_registry_active_series&lt;/code&gt; metric no longer reflects the true demand. Use the &lt;code&gt;tempo_metrics_generator_registry_active_series_demand_estimate&lt;/code&gt; metric to estimate what the active series count would be without the limit:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;tempo_metrics_generator_registry_active_series_demand_estimate{}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This metric uses HyperLogLog estimation and has approximately 3% deviation from the actual cardinality. Use this to determine if you need to increase limits or reduce cardinality.&lt;/p&gt;
&lt;h3 id=&#34;span-name-sanitization&#34;&gt;Span name sanitization&lt;/h3&gt;
&lt;p&gt;If &lt;code&gt;span_name&lt;/code&gt; is one of the highest-cardinality labels in your setup, the &lt;code&gt;span_name_sanitization&lt;/code&gt; option can reduce it by grouping similar span names and replacing variable segments. For example, &lt;code&gt;GET /users/123&lt;/code&gt; and &lt;code&gt;GET /users/456&lt;/code&gt; are both mapped to &lt;code&gt;GET /users/&amp;lt;_&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To evaluate the potential impact without modifying metrics, set &lt;code&gt;span_name_sanitization&lt;/code&gt; to &lt;code&gt;dry_run&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
  defaults:
    metrics_generator:
      span_name_sanitization: &amp;#34;dry_run&amp;#34;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;After a few minutes, compare the demand estimate against current active series:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;tempo_metrics_generator_registry_post_sanitization_demand_estimate{}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If this value is significantly lower than &lt;code&gt;tempo_metrics_generator_registry_active_series&lt;/code&gt;, switch to &lt;code&gt;enabled&lt;/code&gt; to apply the reduction.&lt;/p&gt;
&lt;p&gt;After you enable the option, use the following metric to confirm spans are being sanitized:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;rate(tempo_metrics_generator_registry_spans_sanitized_total{}[5m])&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If this rate is zero after enabling, the DRAIN model hasn&amp;rsquo;t found patterns yet. This is expected for workloads with already-consistent span naming. The model trains continuously and adapts as new span names arrive.&lt;/p&gt;
&lt;p&gt;For more details on configuration and usage, refer to 
    &lt;a href=&#34;/docs/tempo/v3.0.x/metrics-from-traces/metrics-generator/reduce-cardinality/&#34;&gt;Reduce cardinality with span name sanitization&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&#34;remote-write-failures&#34;&gt;Remote write failures&lt;/h3&gt;
&lt;p&gt;For any number of reasons, the generator may fail a write to the remote write target. Use the following metrics to
determine if that&amp;rsquo;s happening:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(prometheus_remote_storage_samples_failed_total{}[1m]))
sum(rate(prometheus_remote_storage_samples_dropped_total{}[1m]))
sum(rate(prometheus_remote_storage_exemplars_failed_total{}[1m]))
sum(rate(prometheus_remote_storage_exemplars_dropped_total{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;service-graph-metrics&#34;&gt;Service graph metrics&lt;/h2&gt;
&lt;p&gt;Service graphs have additional configuration which can impact the quality of the output metrics.&lt;/p&gt;
&lt;h3 id=&#34;expired-edges&#34;&gt;Expired edges&lt;/h3&gt;
&lt;p&gt;The following metrics can be used to determine how many edges are failing to find a match.
The expired edge only includes those edges that are expired and have no matching information to generate a service graph edge.&lt;/p&gt;
&lt;p&gt;Rate of edges that have expired without a match:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_processor_service_graphs_expired_edges{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Rate of all edges:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_processor_service_graphs_edges{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you are seeing a large number of edges expire without a match, consider adjusting the &lt;code&gt;wait&lt;/code&gt; setting. This
controls how long the metrics generator waits to find a match before it gives up.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;metrics_generator:
  processor:
    service_graphs:
      wait: 10s&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;service-graph-max-items&#34;&gt;Service graph max items&lt;/h3&gt;
&lt;p&gt;The service graph processor has a maximum number of edges it tracks at once to limit the total amount of memory the processor uses.
To determine if edges are being dropped due to this limit, check:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_metrics_generator_processor_service_graphs_dropped_spans{}[1m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Use &lt;code&gt;max_items&lt;/code&gt; to adjust the maximum amount of edges tracked:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;metrics_generator:
  processor:
    service_graphs:
      max_items: 10000&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="troubleshoot-metrics-generator">Troubleshoot metrics-generator&lt;/h1>
&lt;p>If you&amp;rsquo;re concerned with data quality issues in the metrics-generator, consider:&lt;/p>
&lt;ul>
&lt;li>Reviewing your telemetry pipeline to determine the number of dropped spans. You are only looking for major issues here.&lt;/li>
&lt;li>Reviewing the
&lt;a href="/docs/tempo/v3.0.x/metrics-generator/service_graphs/">service graph documentation&lt;/a> to understand how they are built.&lt;/li>
&lt;/ul>
&lt;p>If everything seems acceptable from these two perspectives, consider the following topics to help resolve general issues with all metrics and span metrics specifically.&lt;/p></description></item><item><title>Troubleshoot out-of-memory errors</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/out-of-memory-errors/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/out-of-memory-errors/</guid><content><![CDATA[&lt;h1 id=&#34;troubleshoot-out-of-memory-errors&#34;&gt;Troubleshoot out-of-memory errors&lt;/h1&gt;
&lt;p&gt;Learn about out-of-memory (OOM) issues and how to troubleshoot them.&lt;/p&gt;
&lt;h2 id=&#34;set-the-max-attribute-size-to-help-control-out-of-memory-errors&#34;&gt;Set the max attribute size to help control out of memory errors&lt;/h2&gt;
&lt;p&gt;Tempo queriers can run out of memory when fetching traces that have spans with very large attributes.
This issue has been observed when trying to fetch a single trace using the &lt;a href=&#34;/docs/tempo/latest/api_docs/#query&#34;&gt;&lt;code&gt;tracebyID&lt;/code&gt; endpoint&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;To avoid these out-of-memory crashes, use &lt;code&gt;max_attribute_bytes&lt;/code&gt; to limit the maximum allowable size of any individual attribute.
Any key or values that exceed the configured limit are truncated before storing.&lt;/p&gt;
&lt;p&gt;Use the &lt;code&gt;tempo_distributor_attributes_truncated_total&lt;/code&gt; metric to track how many attributes are truncated.
This metric includes &lt;code&gt;tenant&lt;/code&gt; and &lt;code&gt;scope&lt;/code&gt; labels, where &lt;code&gt;scope&lt;/code&gt; is one of &lt;code&gt;resource&lt;/code&gt;, &lt;code&gt;scope&lt;/code&gt;, &lt;code&gt;span&lt;/code&gt;, &lt;code&gt;event&lt;/code&gt;, or &lt;code&gt;link&lt;/code&gt;.
Use the &lt;code&gt;scope&lt;/code&gt; label to identify which part of your trace data produces the most oversized attributes.&lt;/p&gt;
&lt;p&gt;When truncation occurs, the distributor also emits a rate-limited log line (at most one per second) with an example of the truncated attribute, including its scope, name, whether the key or value was truncated, and the original size in bytes.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;   # Optional
    # Configures the max size an attribute can be. Any key or value that exceeds this limit will be truncated before storing
    # Setting this parameter to &amp;#39;0&amp;#39; would disable this check against attribute size
    [max_attribute_bytes: &amp;lt;int&amp;gt; | default = &amp;#39;2048&amp;#39;]&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Refer to the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/#set-max-attribute-size-to-help-control-out-of-memory-errors&#34;&gt;configuration for distributors&lt;/a&gt; documentation for more information.&lt;/p&gt;
&lt;h2 id=&#34;max-trace-size&#34;&gt;Max trace size&lt;/h2&gt;
&lt;p&gt;Traces which are long-running (minutes or hours) or large (100K - 1M spans) spike the memory usage of each component when the large trace is encountered.
Tempo treats traces as single units, and keeps all data for a trace together to enable features like structural queries and analysis.&lt;/p&gt;
&lt;p&gt;Reading a large trace can spike the memory usage of the read components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;query-frontend&lt;/li&gt;
&lt;li&gt;querier&lt;/li&gt;
&lt;li&gt;live-store&lt;/li&gt;
&lt;li&gt;metrics-generator&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Writing a large trace can spike the memory usage of the write components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;live-store&lt;/li&gt;
&lt;li&gt;block-builder&lt;/li&gt;
&lt;li&gt;metrics-generator&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Start with a smaller trace size limit of 15MB, and increase it as needed.
With an average span size of 300 bytes, this allows for 50K spans per trace.&lt;/p&gt;
&lt;p&gt;Verify that you&amp;rsquo;ve configured a limit in &lt;code&gt;max_bytes_per_trace&lt;/code&gt;.
The largest recommended limit is 60MB.&lt;/p&gt;
&lt;p&gt;Configure the limit in the per-tenant overrides:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;overrides:
    &amp;#39;tenant123&amp;#39;:
        max_bytes_per_trace: 1.5e&amp;#43;07&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Refer to the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/#standard-overrides&#34;&gt;Standard overrides&lt;/a&gt; documentation for more information.&lt;/p&gt;
&lt;p&gt;If you have long-running batch job traces, consider using span links to break them apart.&lt;/p&gt;
&lt;h2 id=&#34;large-attributes&#34;&gt;Large attributes&lt;/h2&gt;
&lt;p&gt;Very large attributes, 10KB or longer, can spike the memory usage of each component when they are encountered.
Tempo&amp;rsquo;s Parquet format uses dictionary-encoded columns, which works well for repeated values.
However, for very large and high cardinality attributes, this can require a large amount of memory.&lt;/p&gt;
&lt;p&gt;A common source of large attributes is auto-instrumentation in these areas:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HTTP
&lt;ul&gt;
&lt;li&gt;Request or response bodies&lt;/li&gt;
&lt;li&gt;Large headers
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/attributes-registry/http/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;http.request.header.&amp;lt;key&amp;gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Large URLs
&lt;ul&gt;
&lt;li&gt;http.url&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/attributes-registry/url/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;url.full&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Databases
&lt;ul&gt;
&lt;li&gt;Full query statements&lt;/li&gt;
&lt;li&gt;db.statement&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://opentelemetry.io/docs/specs/semconv/attributes-registry/db/&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;db.query.text&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Queues
&lt;ul&gt;
&lt;li&gt;Message bodies&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When reading these attributes, they can spike the memory usage of the read components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;query-frontend&lt;/li&gt;
&lt;li&gt;querier&lt;/li&gt;
&lt;li&gt;live-store&lt;/li&gt;
&lt;li&gt;metrics-generator&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When writing these attributes, they can spike the memory usage of the write components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;live-store&lt;/li&gt;
&lt;li&gt;block-builder&lt;/li&gt;
&lt;li&gt;metrics-generator&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can &lt;a href=&#34;https://github.com/grafana/tempo/pull/4335&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;automatically limit attribute sizes&lt;/a&gt; using 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/#set-max-attribute-size-to-help-control-out-of-memory-errors&#34;&gt;&lt;code&gt;max_attribute_bytes&lt;/code&gt;&lt;/a&gt;.
You can also use these options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Manually update application instrumentation to remove or limit these attributes&lt;/li&gt;
&lt;li&gt;Drop the attributes in the tracing pipeline using &lt;a href=&#34;https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/attributesprocessor&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;attribute processor&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="troubleshoot-out-of-memory-errors">Troubleshoot out-of-memory errors&lt;/h1>
&lt;p>Learn about out-of-memory (OOM) issues and how to troubleshoot them.&lt;/p>
&lt;h2 id="set-the-max-attribute-size-to-help-control-out-of-memory-errors">Set the max attribute size to help control out of memory errors&lt;/h2>
&lt;p>Tempo queriers can run out of memory when fetching traces that have spans with very large attributes.
This issue has been observed when trying to fetch a single trace using the &lt;a href="/docs/tempo/latest/api_docs/#query">&lt;code>tracebyID&lt;/code> endpoint&lt;/a>.&lt;/p></description></item></channel></rss>