Menu
Grafana Cloud

Retry on RESOURCE_EXHAUSTED failure

Grafana Cloud Traces returns RetryInfo to correctly indicate retryable errors. This change aligns with the OpenTelemetry specification. As per the OTel specification, “Retryable errors indicate that telemetry data processing failed, and the client SHOULD record the error and may retry exporting the same data. For example, this can happen when the server is temporarily unable to process the data.” If an error is retryable, the collector keeps the data and attempts to send again after the interval returned by the server.

Currently, Grafana Cloud Traces returns RESOURCE_EXHAUSTED as a non-retryable error.

Starting on July 1, 2024, RESOURCE_EXHAUSTED will change to being returned as a retryable error.

Note

This behavior change will take effect on July 1, 2024.

Impact

If configured to retry, telemetry collectors (OTel Collector, Grafana Alloy, Grafana Agent) correctly retry for retryable errors. Incorrectly configured collectors might hold too much data in memory and run out of memory and crash. The amount of data a collector holds in memory to retry can be controlled using sending_queue and retry_on_failure configuration options.

For Grafana Alloy, refer to sending_queue and retry_on_failure in Grafana Alloy.

For OpenTelemetry Collector, refer to sending_queue and retry_on_failure in the Configuration section of the README.