Grafana Cloud Enterprise Open source

Troubleshoot Prometheus data source issues

This document provides troubleshooting information for common errors you may encounter when using the Prometheus data source in Grafana.

Connection errors

The following errors occur when Grafana cannot establish or maintain a connection to Prometheus.

Failed to connect to Prometheus

Error message: “There was an error returned querying the Prometheus API”

Cause: Grafana cannot establish a network connection to the Prometheus server.

Solution:

  1. Verify that the Prometheus server URL is correct in the data source configuration.
  2. Check that Prometheus is running and accessible from the Grafana server.
  3. Ensure the URL includes the protocol (http:// or https://).
  4. Verify the port is correct (the Prometheus default port is 9090).
  5. Ensure there are no firewall rules blocking the connection.
  6. If Grafana and Prometheus are running in separate containers, use the container IP address or hostname instead of localhost.
  7. For Grafana Cloud, ensure you have configured Private data source connect if your Prometheus instance is not publicly accessible.

Request timed out

Error message: “context deadline exceeded” or “request timeout”

Cause: The connection to Prometheus timed out before receiving a response.

Solution:

  1. Check the network latency between Grafana and Prometheus.
  2. Verify that Prometheus is not overloaded or experiencing performance issues.
  3. Increase the Query timeout setting in the data source configuration under Interval behavior.
  4. Check the Grafana server timeout configuration for server-level timeout settings.
  5. Reduce the time range or complexity of your query.
  6. Check if any network devices (load balancers, proxies) are timing out the connection.

Failed to parse data source URL

Error message: “Failed to parse data source URL”

Cause: The URL entered in the data source configuration is not valid.

Solution:

  1. Verify the URL format is correct (for example, http://localhost:9090 or https://prometheus.example.com:9090).
  2. Ensure the URL includes the protocol (http:// or https://).
  3. Remove any trailing slashes or invalid characters from the URL.

Authentication errors

The following errors occur when there are issues with authentication credentials or permissions.

Unauthorized (401)

Error message: “401 Unauthorized” or “Authorization failed”

Cause: The authentication credentials are invalid or missing.

Solution:

  1. Verify that the username and password are correct if using basic authentication.
  2. Check that the authentication method selected matches your Prometheus configuration.
  3. If using a reverse proxy with authentication, verify the credentials are correct.
  4. For AWS SigV4 authentication, verify the IAM credentials and permissions. Alternatively, consider using the Amazon Managed Service for Prometheus data source for simplified AWS authentication.

Forbidden (403)

Error message: “403 Forbidden” or “Access denied”

Cause: The authenticated user does not have permission to access the requested resource.

Solution:

  1. Verify the user has read access to the Prometheus API.
  2. Check Prometheus security settings and access control configuration.
  3. If using a reverse proxy, verify the proxy is not blocking the request.
  4. For AWS Managed Prometheus, verify the IAM policy grants the required permissions. Alternatively, consider using the Amazon Managed Service for Prometheus data source for simplified AWS authentication.

Query errors

The following errors occur when there are issues with PromQL syntax or query execution.

Query syntax error

Error message: “parse error: unexpected character” or “bad_data: 1:X: parse error”

Cause: The PromQL query contains invalid syntax.

Alternative cause: A proxy between Grafana and Prometheus requires authentication. When proxy authentication fails, the proxy redirects the request to an HTML authentication page. Grafana cannot parse the HTML response, which results in a parse error. This appears to be a query issue but is actually a proxy authentication issue.

Solution:

  1. Check your query syntax for typos or invalid characters.
  2. Verify that metric names and label names are valid identifiers.
  3. Ensure string values in label matchers are enclosed in quotes.
  4. Use the Prometheus expression browser to test your query directly.
  5. Refer to the Prometheus querying documentation for syntax guidance.
  6. If you have a proxy between Grafana and Prometheus, verify that proxy authentication is correctly configured. Check your proxy logs for authentication failures or redirects.

Query returns no data for a metric

Symptom: The query returns no data and the visualization is empty.

Cause: The specified metric does not exist in Prometheus, or there is no data for the selected time range.

Solution:

  1. Verify the metric name is spelled correctly.
  2. Check that the metric is being scraped by Prometheus.
  3. Use the Prometheus API to browse available metrics at /api/v1/label/__name__/values.
  4. Use the target metadata API to verify which metrics a target exposes.
  5. Verify the time range includes data for the metric.

Query timeout limit exceeded

Error message: “query timed out in expression evaluation” or “query processing would load too many samples”

Cause: The query took longer than the configured timeout limit or would return too many samples.

Solution:

  1. Reduce the time range of your query.
  2. Add more specific label filters to limit the data scanned.
  3. Increase the Query timeout setting in the data source configuration.
  4. Use aggregation functions like sum(), avg(), or rate() to reduce the number of time series.
  5. Increase the query.timeout or query.max-samples settings in Prometheus if you have admin access.

Too many time series

Error message: “exceeded maximum resolution of 11,000 points per timeseries” or “maximum number of series limit exceeded”

Cause: The query is returning more time series or data points than the configured limits allow.

Solution:

  1. Reduce the time range of your query.
  2. Add label filters to limit the number of time series returned.
  3. Increase the Min interval or Resolution in the query options to reduce the number of data points.
  4. Use aggregation functions to combine time series.
  5. Adjust the Series limit setting in the data source configuration under Other settings.

Invalid function or aggregation

Error message: “unknown function” or “parse error: unexpected aggregation”

Cause: The query uses an invalid or unsupported PromQL function.

Solution:

  1. Verify the function name is spelled correctly and is a valid PromQL function.
  2. Check that you are using the correct syntax for the function.
  3. Ensure your Prometheus version supports the function you are using.
  4. Refer to the PromQL functions documentation for available functions.

Configuration errors

The following errors occur when the data source is not configured correctly.

Invalid Prometheus type

Error message: Unexpected behavior when querying metrics or labels

Cause: The Prometheus type setting does not match your actual Prometheus-compatible database.

Solution:

  1. Open the data source configuration in Grafana.
  2. Under Performance, select the correct Prometheus type (Prometheus, Cortex, Mimir, or Thanos).
  3. Different database types support different APIs, so setting this incorrectly may cause unexpected behavior.

Scrape interval mismatch

Symptom: Data appears sparse, or rate() queries return no data or incomplete results.

Cause: The Scrape interval setting in Grafana does not match the actual scrape interval in Prometheus. This especially affects rate() queries, which require at least two data points within the specified time window. For example, if your actual scrape interval is 5 minutes but Grafana uses the default (15 seconds for OSS, 1 minute for Grafana Cloud), a query like rate(http_requests_total[1m]) returns no data because there are no data points within that 1-minute window.

Solution:

  1. Check your Prometheus configuration file for the scrape_interval setting.
  2. Update the Scrape interval in the Grafana data source configuration under Interval behavior to match.
  3. Use $__rate_interval instead of hardcoded time windows in rate() queries. This variable automatically adjusts based on your scrape interval.
  4. For more information, refer to $__rate_interval for Prometheus rate queries that just work.

TLS and certificate errors

The following errors occur when there are issues with TLS configuration.

Certificate verification failed

Error message: “x509: certificate signed by unknown authority” or “certificate verify failed”

Cause: Grafana cannot verify the TLS certificate presented by Prometheus.

Solution:

  1. If using a self-signed certificate, enable Add self-signed certificate in the TLS settings and add your CA certificate.
  2. Verify the certificate chain is complete and valid.
  3. Ensure the certificate has not expired.
  4. As a temporary workaround for testing, enable Skip TLS verify (not recommended for production).

TLS handshake error

Error message: “TLS: handshake failure” or “connection reset”

Cause: The TLS handshake between Grafana and Prometheus failed.

Solution:

  1. Verify that Prometheus is configured to use TLS.
  2. Check that the TLS version and cipher suites are compatible.
  3. If using client certificates, ensure they are correctly configured in the TLS client authentication section.
  4. Verify the server name matches the certificate’s Common Name or Subject Alternative Name.

Other common issues

The following issues don’t produce specific error messages but are commonly encountered.

Empty query results

Cause: The query returns no data.

Solution:

  1. Verify the time range includes data in Prometheus.
  2. Check that the metric and label names are correct.
  3. Test the query directly in the Prometheus expression browser.
  4. Ensure label filters are not excluding all data.
  5. For rate or increase functions, ensure the time range is at least twice the scrape interval.

Slow query performance

Cause: Queries take a long time to execute.

Solution:

  1. Reduce the time range of your query.
  2. Add more specific label filters to limit the data scanned.
  3. Increase the Min interval in the query options.
  4. Check Prometheus server performance and resource utilization.
  5. Enable Disable metrics lookup in the data source configuration for large Prometheus instances.
  6. Enable Incremental querying (beta) to cache query results.
  7. Consider using recording rules to pre-aggregate frequently queried data.

Data appears delayed or missing recent points

Cause: The visualization doesn’t show the most recent data.

Solution:

  1. Check the dashboard time range and refresh settings.
  2. Verify the Scrape interval is configured correctly.
  3. Ensure Prometheus has finished scraping the target.
  4. Check for clock synchronization issues between Grafana and Prometheus.
  5. For rate() and similar functions, remember that they need at least two data points to calculate.

Exemplars not showing

Cause: Exemplar data is not appearing in visualizations.

Solution:

  1. Verify that exemplars are enabled in the data source configuration under Exemplars.
  2. Check that your Prometheus version supports exemplars (2.26+).
  3. Ensure your instrumented application is sending exemplar data.
  4. Verify the tracing data source is correctly configured for the exemplar link.
  5. Enable the Exemplars toggle in the query editor.

Alerting rules not visible

Cause: Prometheus alerting rules are not appearing in the Grafana Alerting UI.

Solution:

  1. Verify that Manage alerts via Alerting UI is enabled in the data source configuration.
  2. Check that Prometheus has alerting rules configured.
  3. Ensure Grafana can access the Prometheus rules API endpoint.
  4. Note that for Prometheus (unlike Mimir), the Alerting UI only supports viewing existing rules, not creating new ones.

Get additional help

If you continue to experience issues after following this troubleshooting guide:

  1. Check the Prometheus documentation for API and PromQL guidance.
  2. Review the Grafana community forums for similar issues.
  3. Contact Grafana Support if you are a Cloud Pro, Cloud Contracted, or Enterprise user.
  4. When reporting issues, include:
    • Grafana version
    • Prometheus version and type (Prometheus, Mimir, Cortex, Thanos)
    • Error messages (redact sensitive information)
    • Steps to reproduce
    • Relevant configuration such as data source settings, query timeout, and TLS settings (redact tokens, passwords, and other credentials)