Retry on RESOURCE_EXHAUSTED
failure
Grafana Cloud Traces returns RetryInfo
to correctly indicate retryable errors. This change aligns with the OpenTelemetry specification.
As per the OTel specification, “Retryable errors indicate that telemetry data processing failed, and the client SHOULD record the error and may retry exporting the same data. For example, this can happen when the server is temporarily unable to process the data.” If an error is retryable, the collector keeps the data and attempts to send again after the interval returned by the server.
Impact
If configured to retry, OTel Collector, Grafana Alloy, and Grafana Agent telemetry collectors correctly retry for retryable errors.
Incorrectly configured collectors might hold too much data in memory and run out of memory and crash.
The amount of data a collector holds in memory to retry can be controlled using sending_queue
and retry_on_failure
configuration options.
For Grafana Alloy, refer to sending_queue
and retry_on_failure
in Grafana Alloy.
For OpenTelemetry Collector, refer to sending_queue
and retry_on_failure
in the Configuration section of the README.