Grafana Cloud

Get help

Having trouble with telemetry signals? This page consolidates the most common troubleshooting issues by signal, and shows you where to escalate issues across metrics, logs, traces, and profiles.

Metrics troubleshooting

If no data appears:

  • Use the grafanacloud-[instance]-usage-insights data source described in Troubleshoot Cloud Metrics write issues and run {instance_type="metrics"} |= "path=write" to surface recent write failures.
  • Confirm your collectors can reach the Prometheus endpoint and that credentials match the ones you generated in Connections.
  • Inspect errors such as “sample too old” or “sample too far in the future” to verify clock skew and out-of-order replay windows.

If you receive high cardinality warnings:

  • Label handling caps label names at 1024 characters, values at 2048 characters, and enforces a label-count limit. Truncated values include a hash suffix; remove unnecessary labels to avoid hitting the cap.
  • Errors such as err-mimir-label-value-too-long or received a series whose number of labels exceeds the limit indicate which series needs to be relabeled.

If you receive query performance issues:

  • Duplicated timestamps generate duplicate sample for timestamp errors; ensure scrapers are not sending the same sample twice.
  • Back filling must stay within the two-hour out_of_order_time_window. Older data should be replayed chronologically so the latest timestamp always moves forward.

Logs troubleshooting

If no logs appear:

  • Follow the steps in Troubleshoot Cloud Logs write issues: query {instance_type="logs"} |= "push request failed" to see error messages from Loki.
  • Validate paths in loki.source.file (or whichever source you use) and confirm that the Grafana Cloud user/password pair matches the ones listed under Connections → Logs.

If you receive label issues:

  • Loki enforces the same 1024/2048 byte label length caps plus a maximum of 15 labels per stream. Errors such as duplicate label name, invalid labels, or entry ... has N label names point to the offending stream.
  • Promote only low-cardinality labels; high-cardinality labels should stay in structured metadata via otlp_config if you enable it through the self-serve limits API.

If you receive query timeout errors:

  • entry too far behind errors mean data is arriving out of order (>1 hour behind the stream head). entry too old signals that data exceeded reject_old_samples_max_age (default one week).
  • Lines larger than 256 KB are rejected unless you set max_line_size_truncate via the configuration API.

Traces troubleshooting

If no traces appear:

  • Troubleshoot Grafana Cloud Traces recommends validating credentials (instance ID as username, token with traces:write scope) and ensuring you hit the correct OTLP endpoint (https://<stack>.grafana.net/tempo for HTTP, <stack>.grafana.net:443 for gRPC).
  • Use the Alloy UI (http://localhost:12345) or alloy fmt to ensure receivers, processors, and exporters are healthy.

If you have missing spans:

  • Metrics-generator only creates metrics for SERVER/CONSUMER span kinds by default. If spans show up in TraceQL but not RED metrics, file a Support ticket to enable the additional kinds or monitor slack as described in metrics-generator constraints.
  • Tail sampling can delay or drop spans if the decision wait is longer than the trace duration; reduce the wait time or adjust caches accordingly. Refer to Sampling for more information about sampling, policies, and examples.

If you find sampling issues or errors:

  • Review the sampling strategy guide: combine probabilistic sampling for baseline coverage with status/latency policies for critical traces. For additional context, refer to Tail sampling policies and strategies in the Tempo documentation.
  • RESOURCE_EXHAUSTED errors are retryable; configure sending_queue and retry_on_failure blocks so exporters back off instead of dropping data.

Profiles troubleshooting

Profile upload failures

  • The Profiles ingestion control API rejects traffic with HTTP 422 once the daily megabyte cap is hit. Check grafanacloud_profiles_instance_ingest_limit_megabytes and grafanacloud_profiles_instance_discarded_bytes_per_second{reason="ingest_limit_reached"} to confirm.
  • Only Grafana Admins can adjust the limit; include the latest metadata.generation in your update to avoid conflicts.

If profiles have trouble uploading:

  • The Profiles ingestion control API rejects traffic with HTTP 422 once the daily megabyte cap is hit. Check grafanacloud_profiles_instance_ingest_limit_megabytes and grafanacloud_profiles_instance_discarded_bytes_per_second{reason="ingest_limit_reached"} to confirm.
  • Only Grafana Admins can adjust the limit; include the latest metadata.generation in your update to avoid conflicts.

If there are symbolization errors:

  • Follow the checklist in Pyroscope Symbolization: download the profile, inspect mappings with go tool pprof -raw, verify build IDs, and make sure a debuginfod server can supply the debug info.
  • Only system libraries are symbolized; customer code without build IDs will continue to show raw addresses.

Cross-signal troubleshooting

This section provides troubleshooting help for when you encounter problems using your telemetry signals together.

Correlation not working

  • Use the same service, environment, cluster, and region labels and attributes in every pipeline. The instrumentation guide recommends adding shared labels (for metrics, logs, profiles) and attributes (for traces) with consistent service names so metrics, logs, traces, and profiles can be stitched together.
  • Ensure trace context propagates through your services so log lines and metrics exemplars can link back to traces.

Authentication errors

  • Re-download credentials from Connections if tokens expire or scopes are missing. Logs and metrics access-policy tokens differ from traces/profile tokens, so double-check which one each collector uses.
  • Verify that Alloy/OpenTelemetry exporters reference the correct regional endpoints (for example, tempo-prod-us-central-0.grafana.net:443). Typing a HTTP URL into a gRPC exporter returns 404/415 errors.

Diagnostic queries

The following sections provide queries that you can run to help you diagnose issues.

Check data is flowing

  • Metrics: sum by (id)(grafanacloud_instance_active_series) and sum by (id)(grafanacloud_instance_samples_per_second) show live usage and match the metrics troubleshooting workflow.
  • Logs: {instance_type="logs"} |= "push request failed" or {instance_type="logs"} |= "rate limit" highlight recent ingestion errors in the Usage Insights Loki data source.
  • Traces: Use a known trace ID to run { trace:id = "0123456789abcdef" } or start from the TraceQL Search builder as documented in the traces troubleshooting guide.

Verify label matching

  • Metrics: sum by (cluster, environment) (up) confirms that the labels you expect are applied to every target. Missing labels indicate relabeling filters.
  • Logs: count_over_time({cluster!=""} |~ "ERROR"[5m]) checks that every stream carries a cluster label and reports errors consistently.
  • Profiles: Filter flame graphs by service_name, region, or other labels coming from your Pyroscope scrapers to confirm that the same taxonomy exists across signals.

Support channels

If you still need help, here is how to get in touch with the Grafana community and Grafana Support.

Community support

Enterprise support