Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

We cannot remember your choice unless you click the consent notice at the bottom.

Grafana OpenTelemetry distributions: prioritizing simplicity, sticking to OSS values

Grafana OpenTelemetry distributions: prioritizing simplicity, sticking to OSS values

2024-09-19 4 min

The OpenTelemetry (OTel) project offers numerous components and instrumentations that support different languages and telemetry signals.

However, this flexibility can be overwhelming, and new users often struggle to choose the right components and configure them properly for their specific use cases. To address this, OpenTelemetry defines the concept of a distribution, a tailored and customized version of OpenTelemetry components.

Here at Grafana Labs, we are committed to OpenTelemetry and to making observability more accessible. That’s why we are thrilled to announce the general availability of the Grafana OpenTelemetry distributions for Java and .NET.

If you just want to get started, you can find the instructions on the Grafana Cloud Application Observability quickstart.

In this post, we’ll share the story behind the Grafana distributions, offering a glimpse into the thought process and technical decisions made. We hope this transparency helps you understand the benefits of our distributions and demonstrates our commitment to being active contributors to the OpenTelemetry project.

The motivation behind Grafana distributions

Our decision to create Grafana OTel distributions was driven by several key factors:

  • Quick bug fixes: We wanted to be able to quickly fix bugs, which is only possible if you control the release cycle.
  • Ease of use: We recognized the challenges in configuring the upstream instrumentation solutions to work seamlessly with Grafana Cloud Application Observability. For example, service.instance.id is important when distinguishing telemetry from individual instances, but it was missing upstream.
  • Cost efficiency: By allowing users to drop unused metrics, we aimed to provide a more cost-effective solution.

How we overcame various development challenges

We faced an unexpected paradox when building these distributions: By simplifying them and making them more user-friendly, we inadvertently made it harder for users to switch to the upstream versions. This effectively locked them into our distributions, which goes against our open source principles — one of our core values is “OSS is in our DNA.”

As a result, we had to make several decisions to ensure users wouldn’t get locked in. One of the key changes was to the OpenTelemetry specification itself. We identified a gap in the specification when it came to generating service instance IDs, which are used to distinguish individual instances in horizontally scaled services. By contributing to the specification (a discussion that lasted several months) and implementing a generator in Java to auto-fill this value, we not only enhanced the ease of use for our distributions but also benefited the broader OpenTelemetry community. Without this change upstream, users would have had to manually populate service instance ID to be able to identify telemetry from individual instances.

Another area where we focused on was authentication. The authentication process was simplified by having our UI generate upstream headers instead of requiring Grafana-specific environment variables, such as GRAFANA_CLOUD_INSTANCE_ID. This change made it easier for users to switch between our distributions and the upstream versions, as they no longer had to deal with Grafana-specific configurations. We even found a gap in the OTel specification — it was ambiguous whether OTEL_EXPORTER_OTLP_HEADERSvalues needed to be URL encoded. This discovery resulted in PRs to align the behavior of the .NET, Java, PHP, Rust, and C++ OTel SDKs.

Finally, we tackled the issue of metric overload. Collecting an excessive amount of metrics could lead to unnecessary costs and complexity for users. To address this, we made the micrometer bridge opt-in upstream (it was opt-out before, but many users were not aware of this setting), cutting metrics of a typical application more than in half. This change gave users more control over their observability data and helped to ensure only necessary metrics were collected.

Through these efforts, we were able to strike a better balance between ease of use and flexibility. Thanks to internal work, and upstream contributions, we’re confident that users can easily switch between our distributions and the upstream OpenTelemetry SDKs.

A journey of innovation, learning, and community

By prioritizing simplicity and open source values, we’re committed to creating solutions that benefit the wider observability community.

Creating Grafana distributions for OTel has been a journey of innovation, learning, and community collaboration. By addressing key challenges and focusing on user needs, we’ve taken significant steps toward simplifying observability practices. We also feel we’re better equipped to address bugs for our users by having greater control over our release cycle. As we move forward, we remain committed to enhancing our solutions, always with the open source community and our users’ best interests at heart.

Start using the Java and .NET distributions today to help you better understand how your applications are performing. We can’t wait to see how the Grafana community puts them to use.