Use polling to monitor Tempo’s backend status
Tempo maintains knowledge of the state of the backend by polling it on regular intervals. There are currently only two components that need this knowledge and, consequently, only two that poll the backend: compactors and queriers.
To reduce calls to the backend, only a small subset of compactors actually list all blocks and build
what’s called a tenant index.
The tenant index is a gzip’ed JSON file located at
an entry for every block and compacted block for that tenant.
This is done once every
All other compactors and all queriers then rely on downloading this file, unzipping it and using the contained list.
Again, this is done once every
Due to this behavior, a given compactor or querier will often have an out-of-date blocklist.
During normal operation, it will stale by at most twice the configured
Note: For details about configuring polling, see polling configuration.
Monitor polling with dashboards and alerts
If you are building your own dashboards or alerts, here are a few relevant metrics:
tempodb_blocklist_poll_errors_totalA holistic metric that increments for any error with polling the blocklist. Any increase in this metric should be reviewed.
tempodb_blocklist_poll_duration_secondsHistogram recording the length of time in seconds to poll the entire blocklist.
tempodb_blocklist_lengthTotal blocks as seen by this component.
tempodb_blocklist_tenant_index_errors_totalA holistic metrics that indcrements for any error building the tenant index. Any increase in this metric should be reviewed.
tempodb_blocklist_tenant_index_builderA gauge that has the value 1 if this compactor is attempting to build the tenant index and 0 if it is not. At least one compactor must have this value set to 1 for the system to be working.
tempodb_blocklist_tenant_index_age_secondsThe age of the last loaded tenant index. now() minus this value indicates how stale this components view of the blocklist is.