What are traces?
A user on your website enters their email address into a form to sign up for your mailing list. They click Enter.
The user’s email address is data that flows through your system. In a cloud computing world, it is possible that clicking that one button causes data to touch multiple nodes across your cluster of microservices.
The email address may be sent to a verification algorithm sitting in a microservice that exists solely for that purpose. If it passes the check, the information is stored in a database.
Along the way an anonymization node strips personally-identifying data from the address and sends metadata collected to a marketing qualifying algorithm to determine whether the request was sent from a targeted part of the internet.
Services respond and data flows back from each, sometimes triggering new events across the system. Along the way, logs are written in various nodes with a time stamp showing when the info passed through.
Finally, the request and response activity ends and a record of that request is sent to Grafana Cloud.
Grafana Cloud Traces
Grafana Cloud Traces is based on Tempo, an open-source, easy-to-use, and high-scale distributed tracing backend. Tempo is cost-efficient, requiring only object storage to operate, and is deeply integrated with Grafana, Prometheus, and Loki. Tempo can be used with any of the open-source tracing protocols, including Jaeger, Zipkin, and OpenTelemetry.
Grafana Cloud Traces lets you search for traces, generate metrics from spans, and link your tracing data with logs and metrics.
A deeper introduction to Tempo
Grafana Tempo is a high volume distributed tracing backend that can retrieve a trace when queried for the trace-id. It builds an index on the high cardinality trace-id field and uses an object store as backend which allows for high parallelization of queries. Read more about this in the architecture section of the docs.
Tempo has strong integrations with a number of existing open source tools, including:
- Grafana. Grafana ships with native support for Tempo using the built-in Tempo data source.
- Grafana Loki. Loki, with its powerful query language LogQL v2 allows us to filter down on requests that we care about, and jump to traces using the Derived fields support in Grafana.
- Prometheus exemplars. Exemplars let you jump from Prometheus metrics to Tempo traces by clicking on recorded exemplars. Read more about this integration in this blog post.
Search for traces
Search for traces using common dimensions such as time range, duration, span tags, service names, etc. Use the trace view to quickly diagnose errors and high latency events in your system.
Refine your search using TraceQL
Inspired by PromQL and LogQL, TraceQL is a query language designed for selecting traces.
The default traces search reviews the whole trace. TraceQL provides a method for formulating precise queries so you can zoom in to the data you need. Query results are returned faster because the queries limit what is searched.
traceqlEditorfeature flag needs to be enabled to access the TraceQL editor in Grafana Cloud. Contact Grafana Support to open a ticket to enable this feature.
For details about how queries are constructed, read the TraceQL documentation.
Metrics from spans
RED metrics can be used to drive service graphs and other ready-to-go visualizations of your span data. RED metrics represent:
- Rate, the number of requests per second
- Errors, the number of those requests that are failing
- Duration, the amount of time those requests take
For more information about RED method, refer to The RED Method: How to instrument your services.
Metrics generation is disabled by default. Contact Grafana Support to enable metrics generation for your organization.
These metrics exist in your Hosted Metrics instance and can also be easily used to generate powerful custom dashboards.
Metrics automatically generate exemplars as well which allows easy metrics to trace linking. Exemplars are GA in Grafana Cloud so you can also push your own.
Service graph view
Service graph view displays a table of request rate, error rate, and duration metrics (RED) calculated from your incoming spans. It also includes a node graph view built from your spans. To use the service graph view, you need to enable service graphs and span metrics. Once enabled, this pre-configured view is immediately available in Explore > Service Graphs.
See service graph view documentation for further explanation of this view and how to enable it.
Linking traces and logs
If you’re already doing request/response logging with trace IDs, they can be easily extracted from logs to jump directly to your traces.
In the other direction, you can configure Grafana Cloud to create a link from an individual span to your Loki logs. If you see a long-running or errored span, you can immediately jump to the logs of the process causing the error.
Refer to Set up and use tracing to get started.
Note: Cloud Traces only supports custom tags added by Grafana Support. Cloud Traces supports these default tags: cluster, hostname, namespace, and pod. Contact Support to add a custom tag.
Linking traces and metrics
Grafana can correlate different signals by adding the functionality to link between traces and metrics. The trace to metrics feature, a beta feature in Grafana 9.1, lets you quickly see trends or aggregated data related to each span.
You can try it out by enabling the
traceToMetrics feature toggle in your Grafana configuration file.
For example, you can use span attributes to metric labels by using the
$__tags keyword to convert span attributes to metrics labels.
For more information, refer to the trace to metric configuration documentation.