Moving to a scalable, distributed microservice architecture poses a great deal of challenges for any organization. It gets harder to understand the system and pinpoint where errors originate. Logs get much messier, and stitching together a coherent picture of a particular request can be time-consuming or downright impossible.
Distributed tracing can help with all of that. Similar to flame graphs, distributed tracing helps you understand what is happening in your application, whether it is monolithic or distributed application with hundreds of microservices.
Adding support for traces into Grafana was part of our long-term goal to make a full observability platform. We started last year when we introduced Explore and the first logging integrations. This year, with the just-released Grafana 7.0, we went one step further by adding a trace viewer and data sources for the popular Jaeger and Zipkin distributed tracing systems.
When we started working on tracing integration, one question was how to approach the implementation of the trace view visual component. We did not want to reinvent the wheel and wanted to focus on functionality and usability. The Jaeger team did a terrific job providing an amazing UI and a highly functional trace view component using the same technologies we already use. As they were open to embedding their UI in Grafana, it made more sense than any other approach. So here’s a shout out to them for creating a great tool and open sourcing it so others can benefit.
This means that lots of users will find Grafana’s tracing UI familiar.
Here’s a look:
In addition, we benefit from lots of performance optimizations and a feature- and information-dense UI. The trace viewer allows you to zoom into a particular region of the trace using the handy minimap for better view of shorter spans. Each span can be expanded to access various span details like tags or logs associated with the trace.
Both Jaeger and Zipkin data sources come with basic querying functionality where you can search by trace ID or select a trace from a selector based on service name, operation name, and trace duration.
Having a trace view directly in Grafana is only a small part of the observability experience. The main challenge is figuring out what you should actually look at. As a first step to having a fully integrated experience, we also introduced internal linking with Loki derived fields.
Derived fields allow you to dynamically create new fields by parsing the log message. In addition, they allow you to add links using the field value. Before Grafana 7.0, you could already use this feature to create fields and add any external link. Now you can also configure links pointing to supported data sources in Grafana and use the field value as part of the target data source query. For example, you can use a regex pattern like
traceID=(\w+) to capture the trace ID from your logs and then use it as the value of a Jaeger query. This link will then open Explore with the Jaeger data source selected and trace ID filled in.
This is one of the first ways you can leverage various data sources and correlate data between them. Going forward, we are going to provide support for more data sources and more integrations. The ability to find traces from your metrics and go from a trace to its logs and integrations with most popular cloud providers are just a few of the things that are in the works.
Ultimately, we are aiming for a fully integrated experience across various facets of observability, allowing you to see the behavior of your application and solve problems faster and easier. At the same time we want to keep our big tent philosophy and allow you to mix and match any data sources you choose.
Upgrade to Grafana 7.0
To try out the trace view and Jaeger and Zipkin integrations, get 7.0 here.
And check out these other new features in Grafana 7.0: