Troubleshoot AWS Application Signals data source issues
This document covers common failure modes in the AWS Application Signals data source and how to resolve them. Use the page navigation to jump to the symptom you’re seeing.
Sections are grouped by where the problem appears:
- Authentication errors — what happens when Grafana can’t get AWS credentials.
- Connection errors — DNS, networking, regions, and endpoints.
- Query editor errors — empty panels, “no data”, and query-type-specific failures.
- Alerting errors — issues specific to building and evaluating alert rules on Trace Statistics queries.
- Template variable errors — variables that return nothing or load slowly.
- Performance issues — throttling, slow queries, and quota limits.
- Enable debug logging — how to capture plugin and SDK diagnostics.
- Get additional help — community and support channels.
Authentication errors
These errors occur when credentials are invalid, missing, or don’t have the required permissions.
“Access denied” or “not authorized” on Save & test
Symptoms:
- Clicking Save & test returns an access-denied error.
- Queries return “not authorized to perform” messages.
- Service, account, or region drop-downs are empty.
Possible causes and solutions:
“Could not load credentials” or “no valid providers in chain”
Symptoms:
- Save & test fails with credential-chain errors.
- Grafana log shows “NoCredentialProviders” or “could not find credentials”.
Solutions:
- For Credentials file auth, confirm
~/.aws/credentialsexists for the user runninggrafana-serverand is readable (permissions0644or stricter). - For AWS SDK Default, confirm the environment exposes credentials (environment variables, EC2 role, or container role).
- For EC2 IAM role, confirm a role is attached to the EC2 instance and the instance metadata service is reachable.
- Restart
grafana-serverafter changing environment variables or credentials files.
“ExpiredToken” or “The security token included in the request is expired”
Symptoms:
- Queries that previously worked start returning token-expired errors.
- The error only happens after a long period of idleness.
Solutions:
- For static session tokens: regenerate the temporary credentials and update the data source. The plugin’s Grafana Assume Role authentication refreshes session tokens automatically.
- For Grafana Assume Role: confirm the target role’s maximum session duration is long enough for your usage pattern.
- For Credentials file auth: regenerate the credentials and update
~/.aws/credentials.
Connection errors
These errors occur when Grafana can’t reach the AWS X-Ray or Application Signals endpoints.
Connection timeouts or “dial tcp: connection refused”
Symptoms:
- Save & test hangs and then times out.
- Queries intermittently fail with network errors.
- Errors mention
xray.<region>.amazonaws.comorapplication-signals.<region>.amazonaws.com.
Solutions:
- Verify outbound HTTPS (port 443) is allowed from the Grafana host to the AWS X-Ray and Application Signals endpoints.
- Confirm the Default region is set to a region where your resources exist.
- If you use a custom Endpoint, confirm the URL is correct and reachable from Grafana.
- For private AWS networks or VPC endpoints in Grafana Cloud, configure Private Data source Connect and select a PDC network in the data source settings.
Endpoint or region mismatch
Symptoms:
- Queries return
ResourceNotFoundor empty results even though data exists in AWS. - The service map or services list is unexpectedly empty.
Solutions:
- Confirm the query’s Region matches where your traces and services exist.
- If you set a custom Endpoint, make sure it corresponds to the same region as Default region.
- Clear any custom endpoint you no longer need. A stale endpoint silently routes requests to the wrong service.
Query editor errors
These errors surface in the query editor when executing queries against the data source. Each symptom below maps to a specific query type or field, so match what you see in your panel before jumping to a solution.
“No data” or empty results
Symptoms:
- The panel shows No data even though you expect traces or services to exist.
Possible causes and solutions:
“Trace not found”
Symptoms:
- Pasting a trace ID into the query field returns a “trace not found” error.
Solutions:
- Verify the trace ID is correct and hasn’t been truncated.
- Confirm the trace exists in the Default region of the data source, or select the correct region in the query header.
- X-Ray retains traces for 30 days by default. Older traces are no longer retrievable.
- If the trace uses the W3C format, the plugin automatically converts it. If it still can’t be found, retry with the X-Ray trace ID format (
1-xxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxx).
“Service not set on query”
Symptoms:
- A Services mode query fails with “Service not set on query”.
Solutions:
- Select a Service in the query editor. List service operations, List service dependencies, and List SLOs all require a selected service.
- If you use a template variable for service, confirm it’s populated (for example, depending on AccountId) before the dependent query runs.
“Unknown query type” or “unknown service query type”
Symptoms:
- Queries fail with “unknown query type” or “unknown service query type”.
Solutions:
- The query was likely saved by an older or newer plugin version. Open the query in the editor and reselect the query type.
- Update the plugin to the latest version. Navigate to Plugins and data > Plugins.
Trace list returns only 1000 results
Symptoms:
- Trace list results appear capped.
- You expect more than 1000 traces in the time range but only 1000 are returned.
Solutions:
- Narrow the time range to reduce the number of traces returned.
- Add a filter expression or select a group to make the result set more specific.
- Split the query across multiple panels or time ranges if you need a full audit.
Insights queries return no data
Symptoms:
- The Insights query type returns an empty table even though active insights exist.
Solutions:
- Make a selection in the Group drop-down. The drop-down includes a synthetic All option that iterates every group — pick it if you don’t want to scope to a specific group.
- Set the State filter to All to see both active and closed insights.
- Confirm the IAM identity has
xray:GetInsightSummariesandxray:GetInsight. - Insights are only generated for groups that contain enough trace volume to detect anomalies. Verify in the AWS console that the selected group has active insights.
Service map is empty
Symptoms:
- The Service Map query returns no nodes even though traces exist in the selected time range.
Solutions:
- Confirm the dashboard time range covers a window in which traces were captured.
- If cross-account observability is configured, check the AccountId multi-select. Leaving it empty returns only the data source’s own account; selecting linked accounts widens the map.
- Use a Node graph panel — the service map data isn’t designed for table or time-series visualizations.
- Confirm the IAM identity has
xray:GetServiceGraph.
Service variable produces broken filter expressions
Symptoms:
- A Trace list or Trace statistics query that references
$servicereturns a filter-expression parse error. - The query’s effective filter expression contains JSON like
service("{"Type":"Service","Name":"checkout-api"...}").
Cause:
The Service template variable’s value is a JSON blob (that’s what the Application Signals APIs consume), not a service name.
Solution:
Use the :text format modifier to emit the display name:
service("${service:text}")Refer to Template variables — The Service variable’s value is a JSON blob.
Alerting errors
These issues surface when building or evaluating Grafana alert rules on AWS Application Signals or X-Ray data.
Alert rule can’t be created on a query
Symptoms:
- The Set alert rule button is disabled on a panel.
- An alert rule saves but always evaluates as No Data.
Cause:
Grafana Alerting only evaluates queries that return numeric time series. Query types such as Trace list, Service Map, Insights, and most Trace Analytics sub-types return tables or graphs and can’t be reduced to a single number.
Solution:
Rewrite the query as a Trace Statistics query with the same filter expression. Refer to the alerting-compatible query types table.
Alert evaluates as “No Data”
Symptoms:
- The alert rule shows No Data in the rule list or firing history.
- The same query in a panel returns data.
Possible causes and solutions:
Fault-rate alert fires when there’s no traffic
Symptoms:
- An alert like
FaultCount / TotalCount > 0.01fires during quiet hours. - The rule evaluates to
NaNor+Inf.
Cause:
Dividing by a zero Total Count produces NaN or infinity, which can either fire the rule or send it into Error state depending on how you reduce the result.
Solution:
Add a guard expression before the threshold. Grafana Math expressions don’t support if/else or the ternary operator (?:), so use the boolean-multiplication trick. Relational operators return 1 for true and 0 for false, so adding ($TotalCount == 0) to the denominator makes it 1 instead of 0 in quiet windows:
$FaultCount / ($TotalCount + ($TotalCount == 0))When $TotalCount is 0, the divisor becomes 1 and the result is $FaultCount / 1. Because $FaultCount can’t exceed $TotalCount, the result is 0. When $TotalCount is non-zero, the expression reduces to $FaultCount / $TotalCount as expected.
Or require a minimum traffic volume before the ratio alert can fire — for example, alert only when TotalCount > 100 and FaultCount / TotalCount > 0.01.
Alert rule loses its series between evaluations
Symptoms:
- The alert transitions rapidly between Firing, No Data, and Normal.
- Firing history shows inconsistent label sets.
Cause:
Trace Statistics queries return one series per non-empty column and per group dimension. When a column has no data in a given bucket, its series disappears, changing the series set.
Solution:
- In the Reduce expression, set the Mode to Drop non-numeric values or Replace non-numeric values with zero depending on intent.
- Use a single column per alert rule (for example, only Fault Count) so the series set is deterministic.
- Set Resolution to
300sfor low-traffic services so each bucket has enough data to produce a stable series.
Alert throttled by the X-Ray API
Symptoms:
- The alert occasionally transitions to Error with messages such as
ThrottlingExceptionorRate exceeded.
Solution:
Refer to X-Ray API throttling or “Rate exceeded”. For alert-specific mitigation:
- Increase Resolution from
60sto300sto reduceGetTimeSeriesServiceStatisticscall volume. - Consolidate multiple per-service alerts into one alert with a broader filter expression and label-based routing.
- Request a quota increase from AWS if you run many concurrent alert rules.
Alerting on Application Signals SLOs doesn’t match AWS SLO state
Symptoms:
- A Grafana alert built on a List Service Level Objectives (SLO) query fires or clears out of sync with AWS’s own SLO status.
Cause:
The plugin’s SLO query returns the current SLO snapshot at query time, not the AWS-calculated SLO burn rate.
Solution:
For production SLO alerting, prefer native CloudWatch alarms on the SLO metrics Application Signals publishes. Refer to Alerting on Application Signals SLOs.
Template variable errors
These errors occur when using template variables with the data source.
Variables return no values
Possible causes and solutions:
Variables are slow to load
Solutions:
- Set variable Refresh to On dashboard load instead of On time range change so the query only runs when the dashboard opens.
- Narrow the scope of variable queries (for example, restrict Services to a single account).
- Enable query caching in Grafana Cloud or Grafana Enterprise.
Performance issues
These issues relate to slow queries or AWS API limits.
X-Ray API throttling or “Rate exceeded”
Symptoms:
- Queries fail intermittently with “Rate exceeded” or
ThrottlingException. - Dashboards with many panels fail to load simultaneously.
Solutions:
- Reduce the dashboard refresh interval.
- Increase Resolution on Trace statistics queries (for example, from 60s to 300s) to reduce the number of API calls.
- Combine multiple narrow panels into a single broader query where possible.
- Enable query caching in Grafana Cloud or Grafana Enterprise.
- Request a quota increase from AWS if you have a high-traffic monitoring account.
Slow service map or trace list queries
Solutions:
- Narrow the time range. Service map and Trace list performance degrade as the number of traces in the range grows.
- Add a filter expression to reduce the trace population.
- Use an X-Ray Group with a pre-defined filter expression for commonly inspected slices.
Enable debug logging
To capture detailed error information for troubleshooting:
Set the Grafana log level to
debugin the Grafana configuration file:[log] level = debugRestart
grafana-server.Reproduce the issue and review logs in
/var/log/grafana/grafana.log(or your configured log location).Look for entries that include request and response details for X-Ray or Application Signals.
Reset the log level to
infoafter troubleshooting to avoid excessive log volume.
Note
On Grafana Cloud, contact Grafana Support to enable debug logging. You can’t change
grafana.inidirectly on managed instances.
Get additional help
If you’ve tried the solutions above and still encounter issues:
- Search the Grafana community forums for similar issues.
- Review or open an issue on the AWS Application Signals plugin GitHub repository.
- Consult the AWS X-Ray documentation and AWS Application Signals documentation for service-specific guidance.
- Contact Grafana Support if you’re a Grafana Cloud or Grafana Enterprise customer.
- When reporting issues, include:
- Grafana version.
- Plugin version (visible in Plugins and data > Plugins).
- Error messages, with sensitive information redacted.
- Steps to reproduce.
- The relevant data source configuration, with credentials redacted.


