Prometheus 3.0 and OpenTelemetry: a practical guide to storing and querying OTel data
Over the past year, a lot of work has gone into making Prometheus work better with OpenTelemetry—a move that reflects the growing number of engineers and developers that rely on both open source projects.
Historically, Prometheus users have faced a number of challenges when trying to work with OpenTelemetry (and vice versa). This includes:
- Proper handling of resource attributes
- UTF-8 support for metric names and attribute names and values
- Push (OpenTelemetry) vs. Pull (Prometheus) models.
- Cumulative (Prometheus or OTel) vs. delta (just OTel) temporality.
We’re happy to say that with the Prometheus 3.0 release, currently in the release candidate phase, it’s going to be easier than ever to store and query OpenTelemetry data inside Prometheus. In this blog, which is based on our recent PromCon 2024 talk, we’ll walk you through how you can better integrate the two.
Resource attributes
In OpenTelemetry, a resource represents the entity producing telemetry as resource attributes, which are essentially key-value pairs. Maintaining these attributes can be challenging when ingesting OpenTelemetry data into Prometheus.
To get around this, we recommend that you promote OTel resource attributes to metric labels in Prometheus. This method corresponds to how Prometheus has historically copied scraping targets as labels.
The one catch here is that you don’t want to promote too many, because then you’ll end up with performance issues if you have too many labels per metric. You’ll also find a lot of UI clutter. In our experience, here is the list of the most common attributes you should consider promoting:
service.instance.id
service.name
service.namespace
cloud.availability_zone
cloud.region
container.name
deployment.environment.name
k8s.cluster.name
k8s.container.name
k8s.cronjob.name
k8s.daemonset.name
k8s.deployment.name
k8s.job.name
k8s.namespace.name
k8s.pod.name
k8s.replicaset.name
k8s.statefulset.name
Alternatively, you can promote labels at query time instead. The resource attributes are already automatically encoded as labels of the target_info
metric, so you can include them in your queries through PromQL joins.
The benefit here is you don’t have to decide at ingestion time which resource attributes to include. However, join queries can be tricky, and you can end up with conflicts if one or more resource attributes change.
To address this, we’ve created a new PromQL function called info
, which is included as experimental in Prometheus 3.0. info
offers a much easier user experience since it’s just a simple function call which doesn’t require you to know the name of the metric to join with (“target_info”) or the shared labels (“job” and “instance”). It also has performance benefits, as it only fetches the target_info
time series with the correct job and instance labels.
As a potential long-term solution, we’re also looking at persisting OpenTelemetry attributes as native metadata in Prometheus. We’re currently looking at several proposals for creating a metadata store, so we’d love to hear any community feedback as we continue to work toward this goal.
OTLP support
The OpenTelemetry protocol (OTLP) endpoint is now stable for ingestion. Users can now rely on default OpenTelemetry exporters like the OTLP/HTTP exporter instead of having to use custom exporters that may not be as well maintained. Also, by using the OTLP exporter they can keep their entire telemetry pipeline OTel native. Let’s look briefly at some other ways this will make your life easier as a Prometheus user.
Note: Because Prometheus can work without authentication, this type of functionality is disabled by default to avoid accepting unexpected (and unwanted) traffic. To enable it, you’ll need to toggle the
--web.enable-otlp-receiver
flag.
As we discussed in the previous section, you can promote resource attributes on ingestion. We’ve added a new section to the configuration file where you can use the OTLP endpoint to make this process easier:
otlp:
resource_attributes:
- service.instance.id
- deployment.environment.name
- k8s.cluster.name
- ...
This is safe to use as a means to list the resource attributes you want in your metrics. By default, it won’t promote anything, so you won’t have crazy cardinality or UI/UX issues. We suggest that you review the recommended list in the previous section and include ones you want in the config file.
We also strongly recommend enabling out-of-order ingestion. That’s because the OTel collector encourages the batching of metrics—as soon as you have multiple replicas, they won’t coordinate to send the data and you will naturally run into out-of-order metrics naturally.
And finally, we want to call out the fact that Prometheus now independently maintains its own OTLP-to-Prometheus translation code. A lot of work by both communities went into this project, which will be a big help for the Prometheus maintainers. Already, we’ve seen the average request translation time drop by 40%, and memory usage has been reduced by 59% on average. So while users won’t see any immediate impact from this change, it will free up maintainers to work on other projects that will ultimately lead to a better overall experience down the road.
Delta temporality
In monitoring, there are two accepted temporalities: cumulative and delta. To understand the differences between the two, picture a counter that increases by 5 every 15 seconds.
That’s what we get in pull-based Prometheus where you always expose metrics and get the accumulated value. This type of fixed interval is necessary for functions like rate()
and increase()
. But with delta, you see the difference from the last observation, so you need a completely different function to go through the data.
OpenTelemetry supports cumulative and delta temporality, but cumulative is the default in Prometheus. So, if you’re coming from a cumulative system like Grafana, you’ll be fine. But if you’re working with a delta system like Datadog or Graphite, you can run into issues.
To address this, we recommend using the delta to cumulative processor in the OpenTelemetry Collector, which will take the samples, aggregate them, and produce the accumulated value on a fixed interval.
We’ve considered adding delta temporality to Prometheus. Currently, work is being done to support delta ingestion here.
UTF-8
UTF-8 support is coming in Prometheus 3.0! Here’s why that’s important: The example below represents a common error that OpenTelemetry users (and those coming from other monitoring systems) run into when they start working with Prometheus:
People coming from other platforms like Graphite or Datadog use other separators like dots (.
). The OTel spec accepts UTF-8, meaning any character is valid. We want Prometheus to be a good backend for OpenTelemetry, which is why we’re adding UTF-8 support.
This will be enabled by default in Prometheus 3.0, but full interoperability with OpenTelemetry will take a bit more work. For example, people are already working on adding a feature flag so that when you write through the OTLP endpoint, you can decide whether you want to keep any character sent or enforce classic Prometheus naming convention instead.
We’ll have more to come on this topic very soon.
Looking ahead
If you want to learn more about these topics, we’ve also published a guide in the Prometheus docs that covers all the topics addressed in this blog post. It’s a simple starting point for now, but we’re committed to keeping it up to date as more functionality becomes available in the months ahead.
And while we’re excited about all the improvements in Prometheus 3.0, it’s certainly not the end. We are committed to making this work at scale, and it will be a major focus of future development. Ultimately, our goal is to make Prometheus the best OSS store for OpenTelemetry metrics, so please contribute if you want to make that happen! If you have any questions about where to start, you can reach out in the CNCF Slack channel (#prometheus-dev).
And if you’re at KubeCon + CloudNativeCon North America 2024 next week and want to talk in-person, check out Celebrating Prometheus 3.0: A deep dive with the maintainers, which will be run by fellow Grafanistas Richard “RichiH” Hartmann and Josue (Josh) Abreu.
Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!