Retry on RESOURCE_EXHAUSTED failure
Grafana Cloud

Retry on RESOURCE_EXHAUSTED failure

Grafana Cloud Traces returns RetryInfo to correctly indicate retryable errors. This change aligns with the OpenTelemetry specification. As per the OTel specification, “Retryable errors indicate that telemetry data processing failed, and the client SHOULD record the error and may retry exporting the same data. For example, this can happen when the server is temporarily unable to process the data.” If an error is retryable, the collector keeps the data and attempts to send again after the interval returned by the server.

Impact

If configured to retry, OTel Collector, Grafana Alloy, and Grafana Agent telemetry collectors correctly retry for retryable errors. Incorrectly configured collectors might hold too much data in memory and run out of memory and crash. The amount of data a collector holds in memory to retry can be controlled using sending_queue and retry_on_failure configuration options.

For Grafana Alloy, refer to sending_queue and retry_on_failure in Grafana Alloy.

For OpenTelemetry Collector, refer to sending_queue and retry_on_failure in the Configuration section of the README.