Grafana Cloud

Troubleshoot Cloud Metrics write issues

Unfortunately, occasionally metrics are rejected and write failures occur. This can be for a number of reasons; determining the reason for write failures can be challenging.

Because Mimir has detailed information about write failures and can emit them through logs, Cloud Metrics users can see write errors in a self-service fashion. However, there are limitations to what errors Mimir can log. The following are examples of errors that cannot be diagnosed with this feature:

  • If the push request is too big (for example, 1GB). Grafana Cloud would reject the request before it reaches Mimir, so it would never be logged.
  • Prometheus internal issues, for example, Prometheus running out of memory. Prometheus issues wouldn’t be discovered because the request would never reach the Grafana Cloud to be logged by Mimir.

To troubleshoot metrics write issues:

  1. Log in to your Grafana Cloud instance.
  2. Click the menu icon, then click the Explore (compass) icon.
  3. From the menu in the top left of the Explore page, select the grafanacloud-[instanceName]-usage-insights data source.
  4. Query the data source for write issues.
  • Use the Query Builder on the Builder tab.
  • Or, click the Code tab and enter your query in the field. Example queries:
 {instance_type="metrics"} |= "path=write"
 {instance_type="metrics"} |= "push request failed"

The query returns log messages, which include information to help determine why the metric write failed. The log messages include which series discarded metrics, for what reason, and provides valid values.

Note

You can also troubleshoot write issues with Dashboards.

The following sections contain the issues that you can discover using this feature and the message displayed in the logs.

Sample too old

Sample too old:

  • “the sample has been rejected because its timestamp is too old” (Out-of-order ingestion disabled)
  • “the sample has been rejected because another sample with a more recent timestamp has already been ingested and this sample is beyond the out-of-order time window” (Out-of-order ingestion enabled)

This validation error is returned when a sample is submitted with a timestamp that is too far in the past. This problem can be solved by ensuring that metric delivery is configured correctly and that system clocks are synchronized. The default value for out_of_order_time_window determines how far back in time samples can be accepted.

In Grafana Cloud, out-of-order writes can only be accepted up to a certain time window behind the most recent timestamp sent. For example, if the series {__name__="cpu_usage", instance="server1"} has one sample at 8:00, Mimir accepts data for that series as far back in time as the configured out-of-order window allows. If another sample is written at 10:00, the acceptable time window moves forward accordingly.

Tip

If you want to load older metrics into Mimir, you need to do so in a roughly chronological order, you can backfill metrics from years ago as long as you start with the older samples and work your way forward to the most current samples. For example, if today is 2025-05-25, as long as you have appropriate retention settings you can send a sample with timestamp 2018-01-01T13:00:00Z. The same rules apply though, if that’s the most recent sample for the series you can only send older samples that fall within the out-of-order time window.

Make sure your retention policies are configured to handle the time range of the metrics you want to send.

Sample too far in the future

Sample too far in the future: “received a sample whose timestamp is too far in the future”

To solve this error, investigate why this particular series has a timestamp too far into the future. The series in question is returned in the body of the HTTP response. This often indicates clock synchronization issues.

Duplicate sample

Duplicate sample: “duplicate sample for timestamp”

This error means that a sample with the same timestamp already exists for the given series. Mimir does not allow duplicate samples with the same timestamp for a series.

Invalid metric name

Invalid metric name: “received a series with invalid metric name”

Check that your metric names are valid. Metric names may contain ASCII letters, numbers, as well as underscores and colons. They must match the regular expression [a-zA-Z_:][a-zA-Z0-9_:]*. Metric names beginning and ending with double underscores __ are reserved for internal use.

Invalid labels

Invalid labels: “received a series with an invalid label”

Check that your labels are valid. Label names may contain ASCII letters, numbers, as well as underscores. They must match the regular expression [a-zA-Z_][a-zA-Z0-9_]*. Label names beginning and ending with double underscores __ are reserved for internal use.

Label name too long

Label name too long: “received a series whose label name length exceeds the limit”

The default value for max_label_name_length is 1024 characters. You can use promtool to analyze your metrics and their labels.

Label value too long

By default, Grafana Cloud Metrics truncates label values, including the metric names, that exceed the limit and adds a hash suffix so that their length matches the limit. When this occurs, Grafana Cloud Metrics logs the message, “received a series whose label value length exceeds the limit; label value was truncated and appended its hash value”.

If you disable the label value truncation strategy, Grafana Cloud Metrics instead rejects time series with too long label values and logs the error message, “received a series whose label value length of %d exceeds the limit of %d”.

The default value for max_label_value_length is 2048 characters. This setting also applies to the metric name. To avoid exceeding the limit, check the lengths of your label values, or remove one or more labels from the metric series.

Max labels per series

Max labels per series: “received a series whose number of labels exceeds the limit”

The maximum labels per series varies by configuration. This error can be solved by removing one or more labels until you are below the limit.

Missing metric name

Missing metric name: “received series has no metric name”

Add a valid metric name to your time series. Every series must have a __name__ label.

Ingestion rate limited

Ingestion rate limited: “the request has been rejected because the tenant exceeded the ingestion rate limit”

Contact Grafana Support to see if the limit can be increased.

Active series limit

Active series limit: “the request has been rejected because the tenant exceeded the active series limit”

Contact Grafana Support to see if the limit can be increased.

Note

For Grafana Cloud Metrics customers, the “Mimir administrator” is Grafana Support. If you would like to inquire about increasing a rate limit, file a support ticket.