OpenTelemetry: Challenges, priorities, adoption patterns, and solutions
OpenTelemetry has emerged as a key player in the cloud native arena, aiming to streamline the increasingly complex field of cloud native observability. The significant challenges enterprises face led to a broad consensus among software vendors to adopt a unified standard and framework for telemetry data management. This widespread enterprise enthusiasm is evidenced by a rapid increase in metrics that show the interest and adoption on the side of app developers, DevOps engineers, platform engineers, and general technologists.
Key Facts:
- Rapid growth. OpenTelemetry stands out as one of the CNCF’s fastest-expanding projects, enjoying support from a wide range of vendors in infrastructure, applications, and observability platforms.
- Unified standard. It sets a unified standard for the instrumentation, collection, processing, and export of telemetry data.
- Central value. The platform’s core advantage is its role as a single solution for both application and infrastructure telemetry, streamlining instrumentation and data collection.
- Instrumentation. Critical features of OpenTelemetry are its instrumentation libraries and auto-instrumentation capability, which reduce developer effort and enhance data accuracy.
- Community engagement. The project has seen contributions from 10,000 individuals across 1,200 companies, with around 900 developers from 200 companies actively contributing monthly—an 18% increase in developer participation and a 22% rise in company involvement year over year.
- Developer traction. A 445% year-over-year surge in Python library downloads (reaching 21 million in December 2023) and a 410% increase in developer discussions on Stack Overflow over two years indicate growing developer interest and adoption.
What is OpenTelemetry and why do we need it?
OpenTelemetry is a standard and unified platform for instrumenting, generating, collecting, processing, and exporting telemetry data in today’s world of distributed and dynamic cloud native applications. Standardizing the handling of metrics, traces, logs, events, and context information across all areas of an application stack benefits enterprises because it unifies and simplifies the management of their observability data pipelines.
At the same time, OpenTelemetry enables vendors of observability software platforms to fully focus on their core value proposition of offering actionable insights, without spending significant resources on figuring out how to obtain reliable, comprehensive, and well-contextualized telemetry data. These vendors will benefit from ongoing development efforts of the OpenTelemetry project, like adding performance profiles for closer insights into system behavior and attributes describing the entity producing telemetry data, such as a service, host, container, or VM. All of these data streams combined provide the data model required to demystify the complex interactions of modern, distributed, and loosely coupled microservices applications, including their underlying infrastructure stacks.
Capturing this context is the missing link between simply letting loose AI models to analyze vast amounts of data and providing these models with all of the background they need to focus on important data points only. You can compare this to providing a detective with detailed background information on a case. Without the context, the detective might miss crucial clues and connections, leading to incorrect and/or delayed conclusions. OpenTelemetry provides the necessary context to AI models, enabling them to focus on relevant data points and make more accurate and insightful determinations. This context-rich telemetry data allows AI models to effectively analyze the complex interactions of modern, distributed applications and their underlying infrastructure stacks. This analysis is key for organizations to continuously prioritize and optimize their technology investments based on their forecasted business impact.
“In a nutshell, it [OpenTelemetry] is rapidly becoming the industry standard and lets you use one protocol (OTLP) for metrics, traces, and logs and the collector still supports older protocols like Prometheus and Zipkin.”
— Enterprise Architect, Large Automotive Company
Prometheus, OpenTelemetry, Grafana, and eBPF: Trending observability topics
The bar chart highlights the key GitHub topics related to observability. OpenTelemetry’s high ranking underscores its significance in the cloud native community, with numerous open source projects integrating with it. OpenTelemetry, Prometheus, Grafana, and eBPF top the list of cloud native observability tools and platforms.
Top 4 topics on GitHub most closely related to OpenTelemetry
OpenTelemetry, an open source project, provides a comprehensive toolkit for telemetry data collection and serves as an export target for cloud native health and performance data coming directly from Kubernetes, from Prometheus, from eBPF, or from most other relevant data sources. OpenTelemetry acts as a unified data source for Grafana, enabling real-time visualization, analysis, insights, and alerting on telemetry data. This prominence of OpenTelemetry in the observability space, alongside Prometheus, Grafana, and eBPF, highlights its pivotal role in simplifying and enhancing cloud native observability through integration and comprehensive data handling.
Over 40% of YoY increase in OpenTelemetry pull requests on GitHub
Grafana and Prometheus growing in parallel
Prometheus and Grafana are growing at the same pace because they complement one another in the observability stack, with Prometheus providing robust xmetric collection and Grafana offering advanced visualization capabilities. Their combined use allows for comprehensive monitoring, analysis, and alerting, which is essential for maintaining the performance and health of cloud native systems.
How critical products work together
In a cloud native ecosystem, the flow of telemetry data often begins with Kubernetes, the backbone of containerized application environments. Kubernetes generates operational data about its managed applications and their underlying infrastructure. This data includes metrics on resource usage, performance, and the health of various services and workloads. Prometheus, a monitoring tool, then steps in to collect these metrics. eBPF, a kernel technology that provides network packet filtering at the Linux level, can enhance these metrics by adding kernel-level insights, such as network traffic and system calls, which provide deeper visibility into the performance and security of the system. Combining Prometheus data with eBPF data can significantly enhance observability in cloud native environments. For instance, Prometheus can collect metrics on resource usage and performance from Kubernetes-managed applications and infrastructure. eBPF, on the other hand, can provide kernel-level insights, such as network traffic and system calls, offering a deeper understanding of system performance and security. By integrating Prometheus with eBPF data, teams can achieve a more comprehensive view of their systems’ health and performance. This integration allows for real-time visualization, analysis, and alerting on telemetry data, enabling more informed decision-making and proactive issue resolution.
OpenTelemetry then receives, processes, and aggregates the collected data, and acts as a conduit between data collection and observability. OpenTelemetry standardizes telemetry data (metrics, logs, traces, and context data), making it easier to handle diverse data types from Prometheus and eBPF. This standardization is key for consistent and efficient data analysis. Grafana then utilizes this aggregated data to provide visualizations and dashboards. Grafana queries Prometheus for direct metrics visualization and uses data processed by OpenTelemetry for more advanced analytics. This step is vital as it translates complex data into actionable insights, allowing users to easily interpret and make informed decisions based on the data. The flow from Kubernetes to Grafana, via Prometheus, eBPF, and OpenTelemetry, creates a streamlined pipeline that ensures efficient monitoring, quick issue resolution, and effective decision-making in cloud native environments.
Taming the cloud native ecosystem
Kubernetes has emerged as the de facto standard platform for running and managing applications, deeply integrated within a cloud native ecosystem comprising a vast range of products and components. This integration extends Kubernetes’ capabilities with existing corporate technologies and infrastructure. For instance, Kubernetes itself does not natively access storage, but utilizes storage plugins to connect to files, objects, or block storage in data centers or public clouds. Similarly, for corporate network access, Kubernetes employs a set of network plugins that facilitate pod-to-pod communication, network isolation, service networking, load balancing, ingress, and egress control.
The Kubernetes ecosystem is versatile, supporting a wide range of operations such as running databases, creating and managing service meshes, building CI/CD pipelines, sending notifications, and monitoring system performance. Databases in Kubernetes can be deployed as so-called stateful sets with persistent volumes, ensuring data persistence across pod restarts and node failures. Service meshes like Istio or Linkerd enhance network communication with features such as fine-grained control, security, and observability.
Continuous integration and continuous deployment (CI/CD) workflows are streamlined through integrations with tools like Argo, Flux, Jenkins, GitLab, and Spinnaker, automating deployment processes and integrating new code changes into production environments seamlessly. In this complex cloud native application stack, establishing comprehensive observability is crucial. This involves collecting logs, metrics, traces, events, and context data from all relevant internal Kubernetes components and external components integrated with the Kubernetes cluster. OpenTelemetry offers a comprehensive framework for this purpose, enabling the collection, processing, and exporting of telemetry data within a Kubernetes cluster.