Today, we are launching a new Grafana Labs product, Grafana Enterprise Traces. Powered by Grafana Tempo, our open source distributed tracing backend, and built by the maintainers of the project, this offering is an exciting addition to our growing self-managed observability stack tailored for enterprises.
The Grafana Enterprise Stack now comprises:
- Grafana Enterprise Traces, just announced as our newest addition to the Grafana Enterprise Stack, GET is a scalable, secure, self-managed tracing service.
- Grafana Enterprise, an enhanced version of Grafana that includes enterprise features, support, and plugins for data sources for other commercial tools such as Splunk, New Relic, MongoDB, ServiceNow, Oracle, and Snowflake.
- Grafana Enterprise Metrics, an infinitely scalable Prometheus- and Graphite-compatible metrics system designed for large organizations that is simple to use and maintain.
- Grafana Enterprise Logs, a unique approach to log indexing, storage, and administration control that runs securely at scale with expert support from Grafana Labs.
The release of Grafana Enterprise Traces is especially meaningful for us because it completes a journey that we started as a company in 2018 with the release of Grafana Enterprise.
Note: We also provide a fully hosted and managed observability stack in Grafana Cloud, which has a free tier with 10K series of Prometheus metrics, 50GB of logs, and 50GB of traces included.
Let’s talk about distributed tracing
Although the adoption of distributed tracing today is limited relative to logs and metrics, it is rapidly growing in popularity.
The rise of distributing tracing is tightly coupled with the rise of cloud native and microservices-based software architectures. In these sorts of systems, a single request to a system may touch tens of microservices, all of which may be running in different containers, environments, and/or cloud providers. Distributed tracing allows users to walk through the entire request flow, making it possible for developers to identify which step in the chain is the source of a bug or performance bottleneck.
At Grafana Labs we rely on distributed tracing to deliver on the rigorous SLOs we set for our hosted observability service, Grafana Cloud. However, as we started to increase our trace volume, we found ourselves hitting scaling and cost limitations with existing tracing backends.
We built Tempo to solve these challenges — architecting a system that could scale to trace 100% of requests on our read path without breaking our budget.
Tempo doesn’t index traces, making it possible to store orders of magnitude more data for the same cost. In place of an index, it leverages deep integrations within Grafana, Prometheus, and Loki for trace discovery to allow you to pivot seamlessly between metrics, logs, and traces. Long-term trace storage is done completely via object storage, making Tempo extremely cost-efficient to operate.
Tempo is also compatible with any of the open source tracing protocols, including Jaeger, Zipkin, and OpenTelemetry, making switching among any of these technologies easy-to-do.
But enterprises often need additional features — including access controls, indemnification, and support guarantees — in order to push broader adoption. Many companies can’t store traces in the cloud and need a solution they can host themselves.
And that’s where Grafana Enterprise Traces comes in.
Introducing Grafana Enterprise Traces
Grafana Enterprise Traces (GET) is built on a unique approach to trace indexing, storage, and administration control that allows companies to run it securely at scale. Everyone in an organization can access all of their relevant trace data, and companies that have specific security policies or are in regulated industries can leverage the built-in Grafana interface to easily manage permissions and settings and grant individuals access to the resources they need without compromising cost.
Like Tempo, Grafana Enterprise Traces has a natively multi-tenant architecture. This is critical for centralized observability teams trying to offer tracing-as-a-service to their internal customers, while trying to keep their management burden under control. With multi-tenancy, observability teams can run a single GET cluster but provide each of their customer teams a logically isolated partition for their data. GET makes a multi-tenant setup even easier to manage by providing an administrative API and plugin for creating, deleting, and editing tenants as well as per-tenant read and write load limits.
This API and plugin follow the same design patterns used in our Enterprise Metrics and Enterprise Logs products, making it easy for an operator of one to get started with the others.
GET makes it easy for you to create new tenants, set limits for each of them so they don’t interfere with one another and give each team the experience of having their own dedicated tracing backend.
Grafana Enterprise Traces includes the security features enterprises need to scale traces for large, distributed teams. Robust data access policies enable administrators to secure and govern data in order to control where their traces live and who gets to use them.
- Admins can use the built-in interface or a simple API to create access policies and generate tokens that grant or restrict individual access to resources.
- Access policies can be defined with realms and scopes so the administrator can specify which tenants a user has access to, as well as the type of access: read, write, or delete.
- GET’s authentication layer supports the OpenID Connect standard, making it easy for observability teams to integrate with existing token providers at their organization.
GET ships with native authentication support, the ability to define fine-grained access policies.
A global view of trace data
Large enterprises may find themselves running multiple GET clusters for a variety of reasons. They may have operations in multiple geographic regions and want to have a cluster per region. They may want multiple GET clusters, each receiving the same trace data, to provide high availability and redundancy in case a single cluster goes down. Or they may want to separate traces from different environments (production and development, for example) so they can manage two smaller clusters instead of one massive one and limit the blast radius if one goes down.
In these setups, users may sometimes need to look for a trace across all GET clusters simultaneously. To meet this need, GET includes the ability to do query federation — fanning a query out to multiple clusters and then combining the results before returning them to the user. This provides large organizations a global view across all of their tracing backends.
Correlation between metrics, logs, and traces
Understanding all your telemetry data — and the relationships between it — is important. Grafana Enterprise Traces allows you to seamlessly move from Prometheus metrics into relevant traces via exemplars. It also enables seamless transition between logs and traces. Starting from Grafana Enterprise Logs or open source Loki, users can filter down to the log lines they care about, and from there they can open up the view of the relevant trace with a single click using the derived fields support in Grafana.
As all of this follows a holistic design, moving seamlessly between metrics, logs, and traces has become possible at truly cloud native scale.
With Grafana Enterprise Traces, teams get support, training, and consulting provided by the Grafana Labs team, including the creators and maintainers of Tempo. We’ll help with anything organizations need to implement Tempo and Grafana Enterprise Traces.
Commitment to open source
Grafana Enterprise Traces is 100% compatible with the feature set that open source Tempo already provides. It builds on what is available in Tempo, adding features tailored specifically for enterprises that complement, and in no way detract from, the open source project. The Grafana Labs team is committed to improving and adding new features to upstream Tempo and will continue to lead development of the open source project.