<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Issues with querying on Grafana Labs</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/</link><description>Recent content in Issues with querying on Grafana Labs</description><generator>Hugo -- gohugo.io</generator><language>en</language><atom:link href="/docs/tempo/v3.0.x/troubleshooting/querying/index.xml" rel="self" type="application/rss+xml"/><item><title>Unable to find traces</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/unable-to-see-trace/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/unable-to-see-trace/</guid><content><![CDATA[&lt;h1 id=&#34;unable-to-find-traces&#34;&gt;Unable to find traces&lt;/h1&gt;
&lt;p&gt;The two main causes of missing traces are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Issues in ingestion of the data into Tempo. Spans are either not sent correctly to Tempo or they aren&amp;rsquo;t getting sampled.&lt;/li&gt;
&lt;li&gt;Issues querying for traces that have been received by Tempo.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;section-1-diagnose-and-fix-ingestion-issues&#34;&gt;Section 1: Diagnose and fix ingestion issues&lt;/h2&gt;
&lt;p&gt;The first step is to check whether the application spans are actually reaching Tempo.&lt;/p&gt;
&lt;p&gt;Add the following flag to the distributor container - &lt;a href=&#34;https://github.com/grafana/tempo/blob/57da4f3fd5d2966e13a39d27dbed4342af6a857a/modules/distributor/config.go#L55&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;&lt;code&gt;distributor.log_received_spans.enabled&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This flag enables debug logging of all the traces received by the distributor. These logs can help check if Tempo is receiving any traces at all.&lt;/p&gt;
&lt;p&gt;You can also check the following metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tempo_distributor_spans_received_total&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tempo_live_store_traces_created_total&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The value of both metrics should be greater than &lt;code&gt;0&lt;/code&gt; within a few minutes of the application spinning up.
You can check both metrics using either:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The metrics page exposed from Tempo at &lt;code&gt;http://&amp;lt;tempo-address&amp;gt;:&amp;lt;tempo-http-port&amp;gt;/metrics&lt;/code&gt; or&lt;/li&gt;
&lt;li&gt;In Prometheus, if it&amp;rsquo;s used to scrape metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;case-1---tempo_distributor_spans_received_total-is-0&#34;&gt;Case 1 - &lt;code&gt;tempo_distributor_spans_received_total&lt;/code&gt; is 0&lt;/h3&gt;
&lt;p&gt;If the value of &lt;code&gt;tempo_distributor_spans_received_total&lt;/code&gt; is 0, possible reasons are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use of incorrect protocol/port combination while initializing the tracer in the application.&lt;/li&gt;
&lt;li&gt;Tracing records not getting picked up to send to Tempo by the internal sampler.&lt;/li&gt;
&lt;li&gt;Application is running inside docker and sending traces to an incorrect endpoint.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Receiver specific traffic information can also be obtained using &lt;code&gt;tempo_receiver_accepted_spans&lt;/code&gt; which has a label for the receiver (protocol used for ingestion. Ex: &lt;code&gt;jaeger-thrift&lt;/code&gt;).&lt;/p&gt;
&lt;h3 id=&#34;solutions&#34;&gt;Solutions&lt;/h3&gt;
&lt;p&gt;There are three possible solutions: protocol or port problems, sampling issues, or incorrect endpoints.&lt;/p&gt;
&lt;p&gt;To fix protocol or port problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Find out which communication protocol is being used by the application to emit traces. This is unique to every client SDK. For instance: Jaeger Golang Client uses &lt;code&gt;Thrift Compact over UDP&lt;/code&gt; by default.&lt;/li&gt;
&lt;li&gt;Check the list of supported protocols and their ports and ensure that the correct combination is being used.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To fix sampling issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;These issues can be tricky to determine because most SDKs use a probabilistic sampler by default. This may lead to just one in a 1000 records being picked up.&lt;/li&gt;
&lt;li&gt;Check the sampling configuration of the tracer being initialized in the application and make sure it has a high sampling rate.&lt;/li&gt;
&lt;li&gt;Some clients also provide metrics on the number of spans reported from the application, for example &lt;code&gt;jaeger_tracer_reporter_spans_total&lt;/code&gt;. Check the value of that metric if available and make sure it&amp;rsquo;s greater than zero.&lt;/li&gt;
&lt;li&gt;Another way to diagnose this problem would be to generate lots and lots of traces to see if some records make their way to Tempo.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To fix an incorrect endpoint issue:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If the application is also running inside docker, make sure the application is sending traces to the correct endpoint (&lt;code&gt;tempo:&amp;lt;receiver-port&amp;gt;&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;case-2---tempo_live_store_traces_created_total-is-0&#34;&gt;Case 2 - tempo_live_store_traces_created_total is 0&lt;/h2&gt;
&lt;p&gt;If the value of &lt;code&gt;tempo_live_store_traces_created_total&lt;/code&gt; is 0, this can indicate issues between the distributors and Kafka, or between Kafka and the live-stores.&lt;/p&gt;
&lt;h3 id=&#34;solution&#34;&gt;Solution&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Check distributor logs for Kafka write errors such as &lt;code&gt;msg=&amp;quot;failed to write to kafka&amp;quot;&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Verify that Kafka is healthy and that the distributors can reach it.&lt;/li&gt;
&lt;li&gt;Check live-store logs to ensure they are consuming from Kafka successfully. Look for consumer lag metrics to confirm data is flowing.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;case-3---live-store-kafka-lag&#34;&gt;Case 3 - Live-store Kafka lag&lt;/h2&gt;
&lt;p&gt;If the live-store is lagging behind its Kafka partition, queries for recent data may return incomplete results.&lt;/p&gt;
&lt;p&gt;To check whether lag is affecting queries, run the following PromQL query in Grafana or Prometheus:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;rate(tempo_live_store_lagged_requests_total[5m])&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;A non-zero rate means that query time ranges are overlapping with the live-store&amp;rsquo;s Kafka lag, and some recently ingested traces may be missing from results. The metric is labeled by &lt;code&gt;route&lt;/code&gt;, so you can see which query type is affected (&lt;code&gt;/tempopb.Querier/SearchRecent&lt;/code&gt; for search queries or &lt;code&gt;/tempopb.Metrics/QueryRange&lt;/code&gt; for TraceQL metrics queries).&lt;/p&gt;
&lt;h3 id=&#34;solution-1&#34;&gt;Solution&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Check the raw consumer lag per partition using your live-store consumer group label:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;promql&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-promql&#34;&gt;tempo_ingest_group_partition_lag{group=&amp;#34;&amp;lt;CONSUMER_GROUP&amp;gt;&amp;#34;}&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;group&lt;/code&gt; label is derived from the live-store ring instance ID. For example, in a zone-aware deployment the group might be &lt;code&gt;live-store-zone-a&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If lag is persistent, the live-store may need more resources or partitions may need to be redistributed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To make incomplete results explicit, set &lt;code&gt;fail_on_high_lag: true&lt;/code&gt; in the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/#live-store&#34;&gt;live-store configuration&lt;/a&gt;. When enabled, the live-store returns an error instead of silently incomplete results.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;case-4---trace-is-not-recent&#34;&gt;Case 4 - Trace is not recent&lt;/h2&gt;
&lt;p&gt;Live-stores only serve recent data. Older traces are stored in blocks built by the block-builder. If a trace was ingested but can&amp;rsquo;t be found, the block-builder may not be flushing blocks to the backend correctly.&lt;/p&gt;
&lt;h3 id=&#34;solution-2&#34;&gt;Solution&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Check block-builder logs for errors during block creation or flushing to object storage.&lt;/li&gt;
&lt;li&gt;Verify the block-builder is consuming from Kafka by checking consumer lag metrics.&lt;/li&gt;
&lt;li&gt;Check the &lt;code&gt;tempo_block_builder_flushed_blocks&lt;/code&gt; metric to confirm blocks are being written to the backend.&lt;/li&gt;
&lt;li&gt;Check the &lt;code&gt;tempo_block_builder_fetch_errors_total&lt;/code&gt; metric for Kafka fetch issues.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;diagnose-and-fix-sampling-and-limits-issues&#34;&gt;Diagnose and fix sampling and limits issues&lt;/h2&gt;
&lt;p&gt;If you are able to query some traces in Tempo but not others, you have come to the right section.&lt;/p&gt;
&lt;p&gt;This could happen because of a number of reasons and some have been detailed in this blog post:
&lt;a href=&#34;/blog/2020/07/09/where-did-all-my-spans-go-a-guide-to-diagnosing-dropped-spans-in-jaeger-distributed-tracing/&#34;&gt;Where did all my spans go? A guide to diagnosing dropped spans in Jaeger distributed tracing&lt;/a&gt;.
This is useful if you are using the Jaeger Agent.&lt;/p&gt;
&lt;p&gt;If you are using Grafana Alloy, continue reading the following section for metrics to monitor.&lt;/p&gt;
&lt;h3 id=&#34;diagnose-the-issue&#34;&gt;Diagnose the issue&lt;/h3&gt;
&lt;p&gt;Check if the pipeline is dropping spans. The following metrics on Grafana Alloy help determine this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;exporter_send_failed_spans_ratio_total&lt;/code&gt;. The value of this metric should be &lt;code&gt;0&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;receiver_refused_spans_ratio_total&lt;/code&gt;. This value of this metric should be &lt;code&gt;0&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the pipeline isn&amp;rsquo;t reporting any dropped spans, check whether application spans are being dropped by Tempo. The following metrics help determine this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;tempo_receiver_refused_spans&lt;/code&gt;. The value of &lt;code&gt;tempo_receiver_refused_spans&lt;/code&gt; should be &lt;code&gt;0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If the value of &lt;code&gt;tempo_receiver_refused_spans&lt;/code&gt; is greater than 0, then the possible reason is the application spans are being dropped due to rate limiting.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;solutions-1&#34;&gt;Solutions&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;If the pipeline (Grafana Alloy) drops spans, the deployment may need to be scaled up.&lt;/li&gt;
&lt;li&gt;There might also be issues with connectivity to Tempo backend, check Alloy logs and make sure the Tempo endpoint and credentials are correctly configured.&lt;/li&gt;
&lt;li&gt;If Tempo drops spans, this may be due to rate limiting.
Rate limiting may be appropriate and therefore not an issue. The metric simply explains the cause of the missing spans.&lt;/li&gt;
&lt;li&gt;If you require a higher ingest volume, increase the configuration for the rate limiting by adjusting the &lt;code&gt;max_traces_per_user&lt;/code&gt; property in the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/#standard-overrides&#34;&gt;configured override limits&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;div class=&#34;admonition admonition-note&#34;&gt;&lt;blockquote&gt;&lt;p class=&#34;title text-uppercase&#34;&gt;Note&lt;/p&gt;&lt;p&gt;Check the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/#overrides&#34;&gt;ingestion limits page&lt;/a&gt; for further information on limits.&lt;/p&gt;&lt;/blockquote&gt;&lt;/div&gt;

&lt;h2 id=&#34;section-3-diagnose-and-fix-issues-with-querying-traces&#34;&gt;Section 3: Diagnose and fix issues with querying traces&lt;/h2&gt;
&lt;p&gt;If Tempo is correctly ingesting trace spans, then it&amp;rsquo;s time to investigate possible issues with querying the data.&lt;/p&gt;
&lt;p&gt;Check the logs of the query-frontend. The query-frontend pod runs with two containers, &lt;code&gt;query-frontend&lt;/code&gt; and &lt;code&gt;query&lt;/code&gt;.
Use the following command to view query-frontend logs:&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;Bash&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-bash&#34;&gt;kubectl logs -f pod/query-frontend-xxxxx -c query-frontend&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The presence of the following errors in the log may explain issues with querying traces:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;level=info ts=XXXXXXX caller=frontend.go:63 method=GET traceID=XXXXXXXXX url=/api/traces/XXXXXXXXX duration=5m41.729449877s status=500&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;no org id&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;could not dial 10.X.X.X:3200 connection refused&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tenant-id not found&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Possible reasons for these errors are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The querier isn&amp;rsquo;t connected to the query-frontend. Check the value of the metric &lt;code&gt;cortex_query_frontend_connected_clients&lt;/code&gt; exposed by the query-frontend.
It should be &amp;gt; &lt;code&gt;0&lt;/code&gt;, indicating querier connections with the query-frontend.&lt;/li&gt;
&lt;li&gt;Grafana Tempo data source isn&amp;rsquo;t configured to pass &lt;code&gt;tenant-id&lt;/code&gt; in the &lt;code&gt;Authorization&lt;/code&gt; header (multi-tenant deployments only).&lt;/li&gt;
&lt;li&gt;Not connected to Tempo Querier correctly.&lt;/li&gt;
&lt;li&gt;Insufficient permissions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;solutions-2&#34;&gt;Solutions&lt;/h3&gt;
&lt;p&gt;To fix connection issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If the queriers aren&amp;rsquo;t connected to the query-frontend, check the following section in the querier configuration and verify the query-frontend address.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;querier:
  frontend_worker:
    frontend_address: query-frontend-discovery.default.svc.cluster.local:9095&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Validate the Grafana data source configuration and debug network issues between Grafana and Tempo.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To fix an insufficient permissions issue:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Verify that the querier has the &lt;code&gt;LIST&lt;/code&gt; and &lt;code&gt;GET&lt;/code&gt; permissions on the bucket.&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="unable-to-find-traces">Unable to find traces&lt;/h1>
&lt;p>The two main causes of missing traces are:&lt;/p>
&lt;ul>
&lt;li>Issues in ingestion of the data into Tempo. Spans are either not sent correctly to Tempo or they aren&amp;rsquo;t getting sampled.&lt;/li>
&lt;li>Issues querying for traces that have been received by Tempo.&lt;/li>
&lt;/ul>
&lt;h2 id="section-1-diagnose-and-fix-ingestion-issues">Section 1: Diagnose and fix ingestion issues&lt;/h2>
&lt;p>The first step is to check whether the application spans are actually reaching Tempo.&lt;/p></description></item><item><title>Too many jobs in the queue</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/too-many-jobs-in-queue/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/too-many-jobs-in-queue/</guid><content><![CDATA[&lt;h1 id=&#34;too-many-jobs-in-the-queue&#34;&gt;Too many jobs in the queue&lt;/h1&gt;
&lt;p&gt;The error message might also be&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;queue doesn&#39;t have room for 100 jobs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;failed to add a job to work queue&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You may see this error if the scheduler or worker isn&amp;rsquo;t running and the blocklist size has exploded.&lt;/p&gt;
&lt;p&gt;Possible reasons why the scheduler or worker may not be running are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Insufficient permissions.&lt;/li&gt;
&lt;li&gt;Worker sitting idle because no block is hashing to it.&lt;/li&gt;
&lt;li&gt;Incorrect configuration settings.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;diagnose-the-issue&#34;&gt;Diagnose the issue&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Check metric &lt;code&gt;tempodb_compaction_bytes_written_total&lt;/code&gt;
If this is greater than zero (0), it means the worker is running and writing to the backend.&lt;/li&gt;
&lt;li&gt;Check metric &lt;code&gt;tempodb_compaction_errors_total&lt;/code&gt;
If this metric is greater than zero (0), check the logs of the worker for an error message.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Verify that the Worker has the LIST, GET, PUT, and DELETE permissions on the bucket objects.
&lt;ul&gt;
&lt;li&gt;If these permissions are missing, assign them to the worker container.&lt;/li&gt;
&lt;li&gt;For detailed information, refer to the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/hosted-storage/s3/&#34;&gt;Amazon S3 permissions&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;If there&amp;rsquo;s a worker sitting idle while others are running, check the scheduler logs and worker metrics to diagnose the issue.&lt;/li&gt;
&lt;li&gt;Check the following configuration parameters to ensure that there are correct settings:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;max_block_bytes&lt;/code&gt; to determine the maximum size of a block. A good number is anywhere from 100MB to 2GB depending on the workload.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_compaction_objects&lt;/code&gt; to determine the max number of objects in a compacted block. This should relatively high, generally in the millions.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;retention_duration&lt;/code&gt; for how long traces should be retained in the backend.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Check the storage section of the configuration and increase &lt;code&gt;queue_depth&lt;/code&gt;. Do bear in mind that a deeper queue could mean longer
waiting times for query responses. Adjust &lt;code&gt;max_workers&lt;/code&gt; accordingly, which configures the number of parallel workers
that query backend blocks.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;storage:
  trace:
    pool:
      max_workers: 100 # worker pool determines the number of parallel requests to the object store backend
      queue_depth: 10000&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="too-many-jobs-in-the-queue">Too many jobs in the queue&lt;/h1>
&lt;p>The error message might also be&lt;/p>
&lt;ul>
&lt;li>&lt;code>queue doesn't have room for 100 jobs&lt;/code>&lt;/li>
&lt;li>&lt;code>failed to add a job to work queue&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>You may see this error if the scheduler or worker isn&amp;rsquo;t running and the blocklist size has exploded.&lt;/p></description></item><item><title>Bad blocks</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/bad-blocks/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/bad-blocks/</guid><content><![CDATA[&lt;h1 id=&#34;bad-blocks&#34;&gt;Bad blocks&lt;/h1&gt;
&lt;p&gt;Queries fail with an error message containing:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;error querying store in Querier.FindTraceByID: error using pageFinder (1, 5927cbfb-aabe-48b2-9df5-f4c3302d915f): ...&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This might indicate that there is a bad (corrupted) block in the backend.&lt;/p&gt;
&lt;h2 id=&#34;how-blocks-can-get-corrupted&#34;&gt;How blocks can get corrupted&lt;/h2&gt;
&lt;p&gt;Blocks are created by the block-builder, which consumes data from Kafka and flushes blocks to object storage. The block-builder is designed to be recoverable at every stage.
The block-builder rewinds to the last Kafka commit on each cycle, clears its scratch disk, and uses deterministic block IDs so that partial flushes can be safely overwritten.&lt;/p&gt;
&lt;p&gt;A block becomes live only once its &lt;code&gt;meta.json&lt;/code&gt; is written to object storage. Before that point, any crash is fully recoverable.
In rare cases, corruption can still occur. For example, if object storage acknowledges a write that is not fully persisted, or if the data files are corrupted during upload.&lt;/p&gt;
&lt;h2 id=&#34;removing-bad-blocks&#34;&gt;Removing bad blocks&lt;/h2&gt;
&lt;p&gt;If you encounter corrupted blocks, delete the affected blocks, which may result in some loss of data.
The block-builder will replay from Kafka and rebuild any data that hasn&amp;rsquo;t been committed yet. Alternatively, you can restore the blocks from a backup, if available.&lt;/p&gt;
&lt;p&gt;The mechanism to remove a block from the backend is backend-specific, but the block to remove will be at:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;&amp;lt;tenant ID&amp;gt;/&amp;lt;block ID&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="bad-blocks">Bad blocks&lt;/h1>
&lt;p>Queries fail with an error message containing:&lt;/p>
&lt;div class="code-snippet code-snippet__mini">&lt;div class="lang-toolbar__mini">
&lt;span class="code-clipboard">
&lt;button x-data="app_code_snippet()" x-init="init()" @click="copy()">
&lt;img class="code-clipboard__icon" src="/media/images/icons/icon-copy-small-2.svg" alt="Copy code to clipboard" width="14" height="13">
&lt;span>Copy&lt;/span>
&lt;/button>
&lt;/span>
&lt;/div>&lt;div class="code-snippet code-snippet__border">
&lt;pre data-expanded="false">&lt;code class="language-none">error querying store in Querier.FindTraceByID: error using pageFinder (1, 5927cbfb-aabe-48b2-9df5-f4c3302d915f): ...&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>This might indicate that there is a bad (corrupted) block in the backend.&lt;/p></description></item><item><title>Tag search</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/search-tag/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/search-tag/</guid><content><![CDATA[&lt;h1 id=&#34;tag-search&#34;&gt;Tag search&lt;/h1&gt;
&lt;p&gt;An issue occurs while searching for traces in Grafana Explore. The &lt;strong&gt;Service Name&lt;/strong&gt; and &lt;strong&gt;Span Name&lt;/strong&gt; drop down lists are empty, and there is a &lt;code&gt;No options found&lt;/code&gt; message.&lt;/p&gt;
&lt;p&gt;HTTP requests to Tempo query frontend endpoint at &lt;code&gt;/api/search/tag/service.name/values&lt;/code&gt; would respond with an empty set.&lt;/p&gt;
&lt;h2 id=&#34;root-cause&#34;&gt;Root cause&lt;/h2&gt;
&lt;p&gt;The introduction of a cap on the size of tags causes this issue.&lt;/p&gt;
&lt;p&gt;Configuration parameter &lt;code&gt;max_bytes_per_tag_values_query&lt;/code&gt; causes the return of an empty result
when a query exceeds the configured value.&lt;/p&gt;
&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;
&lt;p&gt;There are two main solutions to this issue:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduce the cardinality of tags pushed to Tempo. Reducing the number of unique tag values will reduce the size returned by a tag search query.&lt;/li&gt;
&lt;li&gt;Increase the &lt;code&gt;max_bytes_per_tag_values_query&lt;/code&gt; parameter in the 
    &lt;a href=&#34;/docs/tempo/v3.0.x/configuration/#overrides&#34;&gt;overrides&lt;/a&gt; block of your Tempo configuration to a value as high as 50MB.&lt;/li&gt;
&lt;/ul&gt;
]]></content><description>&lt;h1 id="tag-search">Tag search&lt;/h1>
&lt;p>An issue occurs while searching for traces in Grafana Explore. The &lt;strong>Service Name&lt;/strong> and &lt;strong>Span Name&lt;/strong> drop down lists are empty, and there is a &lt;code>No options found&lt;/code> message.&lt;/p></description></item><item><title>Response larger than the max</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/response-too-large/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/response-too-large/</guid><content><![CDATA[&lt;h1 id=&#34;response-larger-than-the-max&#34;&gt;Response larger than the max&lt;/h1&gt;
&lt;p&gt;The error message is similar to the following:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;500 Internal Server Error Body: response larger than the max (&amp;lt;size&amp;gt; vs &amp;lt;limit&amp;gt;)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This error indicates that the response received or sent is too large.
This can happen in multiple places, but it&amp;rsquo;s most commonly seen in the query path,
with messages between the querier and the query frontend.&lt;/p&gt;
&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;
&lt;h3 id=&#34;tempo-server-general&#34;&gt;Tempo server (general)&lt;/h3&gt;
&lt;p&gt;Tempo components communicate with each other via gRPC requests.
To increase the maximum message size, you can increase the gRPC message size limit in the server block.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;server:
  grpc_server_max_recv_msg_size: &amp;lt;size&amp;gt;
  grpc_server_max_send_msg_size: &amp;lt;size&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The server config block is not synchronized across components.
Most likely you will need to increase the message size limit in multiple components.&lt;/p&gt;
&lt;h3 id=&#34;querier&#34;&gt;Querier&lt;/h3&gt;
&lt;p&gt;Additionally, querier workers can be configured to use a larger message size limit.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;querier:
    frontend_worker:
        grpc_client_config:
            max_send_msg_size: &amp;lt;size&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&#34;ingestion&#34;&gt;Ingestion&lt;/h3&gt;
&lt;p&gt;Lastly, message size is also limited in ingestion and can be modified in the distributor block.&lt;/p&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          max_recv_msg_size_mib: &amp;lt;size&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="response-larger-than-the-max">Response larger than the max&lt;/h1>
&lt;p>The error message is similar to the following:&lt;/p>
&lt;div class="code-snippet code-snippet__mini">&lt;div class="lang-toolbar__mini">
&lt;span class="code-clipboard">
&lt;button x-data="app_code_snippet()" x-init="init()" @click="copy()">
&lt;img class="code-clipboard__icon" src="/media/images/icons/icon-copy-small-2.svg" alt="Copy code to clipboard" width="14" height="13">
&lt;span>Copy&lt;/span>
&lt;/button>
&lt;/span>
&lt;/div>&lt;div class="code-snippet code-snippet__border">
&lt;pre data-expanded="false">&lt;code class="language-none">500 Internal Server Error Body: response larger than the max (&amp;lt;size&amp;gt; vs &amp;lt;limit&amp;gt;)&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;p>This error indicates that the response received or sent is too large.
This can happen in multiple places, but it&amp;rsquo;s most commonly seen in the query path,
with messages between the querier and the query frontend.&lt;/p></description></item><item><title>Long-running traces</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/long-running-traces/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/long-running-traces/</guid><content><![CDATA[&lt;h1 id=&#34;long-running-traces&#34;&gt;Long-running traces&lt;/h1&gt;
&lt;p&gt;Long-running traces are created when Tempo receives spans for a trace,
followed by a delay, and then Tempo receives additional spans for the same
trace. If the delay between spans is great enough, the spans end up in
different blocks, which can lead to inconsistency in a few ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;When using TraceQL search, the duration information only pertains to a
subset of the blocks that contain a trace. This happens because Tempo
consults only enough blocks to know the TraceID of the matching spans. When
performing a TraceID lookup, Tempo searches for all parts of a trace in all
matching blocks, which yields greater accuracy when combined.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When using 
    &lt;a href=&#34;/docs/tempo/v3.0.x/traceql/construct-traceql-queries/#combine-spansets&#34;&gt;&lt;code&gt;spanset&lt;/code&gt;
operators&lt;/a&gt;,
Tempo only evaluates the contiguous trace of the current block. This means
that for a single block the conditions may evaluate to false, but to
consider all parts of the trace from all blocks would evaluate true.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In Tempo 3.0, two components handle trace data independently:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Live-stores&lt;/strong&gt; serve recent data. They hold traces in memory and can keep spans for the same trace together as long as the trace remains active. You can tune the &lt;code&gt;live_store.max_trace_idle&lt;/code&gt; configuration to control when a trace is considered idle. Extending this beyond the default &lt;code&gt;5s&lt;/code&gt; can allow for long-running traces to be co-located, but take into account other considerations around memory consumption on the live-stores.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Block-builders&lt;/strong&gt; consume from Kafka and build blocks for long-term storage. They do a hard cut at a certain record on each consumption cycle. All spans consumed in a cycle are flushed into blocks regardless of whether the trace is complete. This means a trace&amp;rsquo;s spans can be split across block-builder cycles with no way to keep them together.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&#34;data-quality-metrics&#34;&gt;Data quality metrics&lt;/h3&gt;
&lt;p&gt;Tempo publishes a &lt;code&gt;tempo_warnings_total&lt;/code&gt; metric from the live-store, which
can aid in understanding when this situation arises.&lt;/p&gt;
&lt;p&gt;When a trace is flushed to the WAL in the live-store, it&amp;rsquo;s marshalled in the Parquet format which makes it available for TraceQL metrics and search.
The more complete a trace is at this moment, the more accurate complex queries are.
The &lt;code&gt;disconnected_trace_flushed_to_wal&lt;/code&gt; and &lt;code&gt;rootless_trace_flushed_to_wal&lt;/code&gt; metrics help operators measure how reliable their trace data pipeline is.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;disconnected_trace_flushed_to_wal&lt;/code&gt;: Incremented when a trace is flushed that has a span with parent id that cannot be found.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;rootless_trace_flushed_to_wal&lt;/code&gt;: Incremented when a trace is flushed that doesn&amp;rsquo;t have a root span. A root span is a span with all &lt;code&gt;0&lt;/code&gt; parent id.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You might see these data quality metrics if you use a Prometheus query like this to explore Tempo warnings:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_warnings_total{}[5m])) by (reason)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;This example helps determine the percentage of complete traces flushed. This metric can help you optimize your instrumentation and traces pipeline and understand the impact it has on Tempo data quality.&lt;/p&gt;
&lt;p&gt;In particular, the following query can be used to know what percentage of traces flushed to the WAL are connected.&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;1 - sum(rate(tempo_warnings_total{reason=&amp;#34;disconnected_trace_flushed_to_wal&amp;#34;}[5m])) / sum(rate(tempo_live_store_traces_created_total{}[5m]))&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;If you have long-running traces, you may also be interested in the
&lt;code&gt;rootless_trace_flushed_to_wal&lt;/code&gt; reason to know when a trace is flushed to the
WAL without a root span.&lt;/p&gt;
&lt;p&gt;You can use &lt;code&gt;reason&lt;/code&gt; fields for discovery with this query:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;sum(rate(tempo_warnings_total{}[5m])) by (reason)&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;In general, Tempo functions at its peak when all parts of a trace are stored
within as few blocks as possible. There is a wide variety of tracing patterns
in the wild, which makes it impossible to optimize for all of them.&lt;/p&gt;
&lt;p&gt;While the preceding information can help determine what Tempo is doing, it may
be worth modifying the usage pattern slightly. For example, you may want to use
&lt;a href=&#34;https://opentelemetry.io/docs/concepts/signals/traces/#span-links&#34; target=&#34;_blank&#34; rel=&#34;noopener noreferrer&#34;&gt;span
links&lt;/a&gt;, so
that traces are split up, allowing one trace to complete, while pointing to the
next trace in the causal chain . This allows both traces to finish in a
shorter duration, and increase the chances of ending up in the same block.&lt;/p&gt;
]]></content><description>&lt;h1 id="long-running-traces">Long-running traces&lt;/h1>
&lt;p>Long-running traces are created when Tempo receives spans for a trace,
followed by a delay, and then Tempo receives additional spans for the same
trace. If the delay between spans is great enough, the spans end up in
different blocks, which can lead to inconsistency in a few ways:&lt;/p></description></item><item><title>Too many requests error</title><link>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/too-many-requests-error/</link><pubDate>Thu, 28 May 2026 17:50:33 +0100</pubDate><guid>https://grafana.com/docs/tempo/v3.0.x/troubleshooting/querying/too-many-requests-error/</guid><content><![CDATA[&lt;h1 id=&#34;too-many-requests-429-error-code&#34;&gt;Too many requests (429 error code)&lt;/h1&gt;
&lt;p&gt;if an issue occurs during a Tempo query, the error response may look like:&lt;/p&gt;

&lt;div class=&#34;code-snippet code-snippet__mini&#34;&gt;&lt;div class=&#34;lang-toolbar__mini&#34;&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet code-snippet__border&#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-none&#34;&gt;429 failed to execute TraceQL query: {resource.service.name != nil} | rate() by(resource.service.name) Status: 429 Too Many Requests Body: job queue full&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;h2 id=&#34;root-cause&#34;&gt;Root cause&lt;/h2&gt;
&lt;p&gt;Tempo parallelizes work by breaking a single query into multiple requests (jobs) that are distributed to the queriers.
Increasing the time range results in more jobs being created.
To ensure fair resource usage and to prevent the &amp;ldquo;noisy neighbor&amp;rdquo; problem in multi-tenant environments, Tempo limits the number of jobs a tenant can run concurrently. The maximum number of jobs per tenant is controlled by the query-frontend setting &lt;code&gt;max_outstanding_per_tenant&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;solutions&#34;&gt;Solutions&lt;/h2&gt;
&lt;p&gt;There are two main solutions to this issue:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduce the time range of the query.&lt;/li&gt;
&lt;li&gt;Increase the &lt;code&gt;max_outstanding_per_tenant&lt;/code&gt; parameter in the query-frontend configuration from the default of 2000 jobs.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&#34;code-snippet &#34;&gt;&lt;div class=&#34;lang-toolbar&#34;&gt;
    &lt;span class=&#34;lang-toolbar__item lang-toolbar__item-active&#34;&gt;YAML&lt;/span&gt;
    &lt;span class=&#34;code-clipboard&#34;&gt;
      &lt;button x-data=&#34;app_code_snippet()&#34; x-init=&#34;init()&#34; @click=&#34;copy()&#34;&gt;
        &lt;img class=&#34;code-clipboard__icon&#34; src=&#34;/media/images/icons/icon-copy-small-2.svg&#34; alt=&#34;Copy code to clipboard&#34; width=&#34;14&#34; height=&#34;13&#34;&gt;
        &lt;span&gt;Copy&lt;/span&gt;
      &lt;/button&gt;
    &lt;/span&gt;
    &lt;div class=&#34;lang-toolbar__border&#34;&gt;&lt;/div&gt;
  &lt;/div&gt;&lt;div class=&#34;code-snippet &#34;&gt;
    &lt;pre data-expanded=&#34;false&#34;&gt;&lt;code class=&#34;language-yaml&#34;&gt;query-frontend:
  max_outstanding_per_tenant:: &amp;lt;max number of jobs&amp;gt;&lt;/code&gt;&lt;/pre&gt;
  &lt;/div&gt;
&lt;/div&gt;
]]></content><description>&lt;h1 id="too-many-requests-429-error-code">Too many requests (429 error code)&lt;/h1>
&lt;p>if an issue occurs during a Tempo query, the error response may look like:&lt;/p>
&lt;div class="code-snippet code-snippet__mini">&lt;div class="lang-toolbar__mini">
&lt;span class="code-clipboard">
&lt;button x-data="app_code_snippet()" x-init="init()" @click="copy()">
&lt;img class="code-clipboard__icon" src="/media/images/icons/icon-copy-small-2.svg" alt="Copy code to clipboard" width="14" height="13">
&lt;span>Copy&lt;/span>
&lt;/button>
&lt;/span>
&lt;/div>&lt;div class="code-snippet code-snippet__border">
&lt;pre data-expanded="false">&lt;code class="language-none">429 failed to execute TraceQL query: {resource.service.name != nil} | rate() by(resource.service.name) Status: 429 Too Many Requests Body: job queue full&lt;/code>&lt;/pre>
&lt;/div>
&lt;/div>
&lt;h2 id="root-cause">Root cause&lt;/h2>
&lt;p>Tempo parallelizes work by breaking a single query into multiple requests (jobs) that are distributed to the queriers.
Increasing the time range results in more jobs being created.
To ensure fair resource usage and to prevent the &amp;ldquo;noisy neighbor&amp;rdquo; problem in multi-tenant environments, Tempo limits the number of jobs a tenant can run concurrently. The maximum number of jobs per tenant is controlled by the query-frontend setting &lt;code>max_outstanding_per_tenant&lt;/code>.&lt;/p></description></item></channel></rss>