Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

OpenTelemetry with Prometheus: better integration through resource attribute promotion

OpenTelemetry with Prometheus: better integration through resource attribute promotion

2025-05-20 7 min

With the 3.0 release, Prometheus firmly established itself as the leading metrics database for  OpenTelemetry. A lot of work has gone into integrating the two open source projects, including a major Prometheus enhancement we’re really excited about: resource attribute promotion.  

Resource attribute promotion is enabled by default if you’re a Grafana Cloud user and it can already be leveraged in visualizations like Grafana Metrics Drilldown and with community dashboards like the Lightweight APM for OpenTelemetry dashboard

Of course, new features come with new best practices. In this blog, we’ll explore:

  • The challenges resource attribute promotion addresses
  • How it simplifies dashboard creation and metric exploration
  • How it facilitates seamless correlation between Prometheus metrics, Grafana Loki logs, and Grafana Tempo traces—or any third-party logs and traces. 

Addressing OpenTelemetry-Prometheus integration challenges

There’s a clear need for interoperability between these two projects, as nearly three-quarters of organizations are using both in some capacity, and most are increasing their investments, according to our third annual Observability Survey. But despite Prometheus’s strong OpenTelemetry integration, some challenges remain. These include:

  • Dashboard filtering struggles with OpenTelemetry semantic conventions: Creating dashboards filters on core OpenTelemetry attributes like deployment.environment.name, service.name, or service.namespace has often required workarounds as these OpenTelemetry attributes were not directly promoted as labels on the metrics. This often involved using the non-OpenTelemetry-standard job and instance labels or performing complex joins on the target_info metric.
  • Limited metric exploration: Exploring metrics using slicing and dicing techniques on common dimensions such as service.version or cloud.availability_zone has been cumbersome. Again, this is because these attributes were not promoted as labels on the metrics, requiring complex joins on the target_info metric.

To address these pain points, Prometheus introduced the ability to promote resource attributes. This feature of the Prometheus OTLP endpoint promotes a predefined list of OpenTelemetry resource attributes as metric labels so they’re right at your fingertips when writing queries and building dashboards. 

And don’t worry about having to rewrite everything you’ve done in the past. Resource attribute promotion is backward compatible: existing queries, dashboards, and alerts will continue to work without changes. The only caution is for users of metrics aggregation solutions like Grafana Adaptive metrics, who should update aggregation rules on the instance label and add the resource attributes that identify service instances like service_instance_id or k8s_pod_name.

Enabling resource attribute promotion

First, ensure you have adopted Prometheus’s OTLP endpoint to ingest OpenTelemetry metrics, eliminating the metric conversion from OpenTelemetry to Prometheus format within the OpenTelemetry Collector or Grafana Alloy using the Prometheus Remote Write exporter.

Then, activate resource attribute promotion on the Prometheus OTLP endpoint. For Grafana Cloud users, resource attribute promotion is already enabled with the following default list of attributes:

  • service.name
  • service.namespace
  • service.instance.id
  • service.version
  • cloud.availability_zone
  • cloud.region
  • container.name
  • deployment.environment
  • deployment.environment.name
  • k8s.cluster.name
  • k8s.container.name
  • k8s.cronjob.name
  • k8s.daemonset.name
  • k8s.deployment.name
  • k8s.job.name
  • k8s.namespace.name
  • k8s.pod.name
  • k8s.replicaset.name
  • k8s.statefulset.name

These attributes can also be customized via support tickets.

If you use Prometheus or Grafana Mimir (OSS or Grafana Enterprise Metrics), you can configure this list using the promote_resource_attributes configuration block:

otlp:
  keep_identifying_resource_attributes: true
  promote_resource_attributes:
    - service.instance.id
    - service.name
    - service.namespace
    - service.version
    - cloud.availability_zone
    - cloud.region
    - container.name
    - deployment.environment
    - deployment.environment.name
    - k8s.cluster.name
    - k8s.container.name
    - k8s.cronjob.name
    - k8s.daemonset.name
    - k8s.deployment.name
    - k8s.job.name
    - k8s.namespace.name
    - k8s.pod.name
    - k8s.replicaset.name
    - k8s.statefulset.name

Regardless of whether you’re using OSS or Grafana Cloud, we recommend enabling keep_identifying_resource_attributes to also capture service_name, service_namespace, and service_instance_id on the target_info metric, which is very handy for dashboard filters (as shown below).

This feature can be enabled following the appropriate steps for your setup:

  • On Grafana Cloud, please open a support ticket to activate keep_identifying_resource_attributes.
  • When using Prometheus, follow these instructions.
  • When using Mimir, follow these instructions.

Richer community dashboards

Once resource attribute promotion and keep_identifying_resource_attributes are enabled, you can immediately get richer OpenTelemetry native dashboards, such as the Lightweight APM for OpenTelemetry dashboard, through the Grafana community.

This dashboard provides a simplified APM experience, including a service selector, RED metrics on inbound HTTP/RPC operations, and RED metrics on outbound HTTP/RPC/database calls, alongside log and trace correlation!

Lightweight APM for OpenTelemetry dashboard, with panels highlighted to illustrate the functionality

Best practices with resource attribute promotion

Resource attribute promotion unlocks new potential for dashboards and PromQL queries. However, it also requires adjustments to your approach. Next, we’ll walk through some best practices to help you get the most from this functionality.

Fully adopt OpenTelemetry semantic conventions and move away from Prometheus naming conventions.

Resource attribute promotion eliminates the need to use the job and instance labels inherited from the Prometheus naming conventions in your dashboards and PromQL queries. This allows full adoption of OpenTelemetry semantic conventions and their attributes: service.namespace, service.name, and service.instance.id.

To illustrate the difference, let’s first look at an example that previously had to mix OpenTelemetry semantic conventions with Prometheus’ job and instance labels.

The dashboard filter needs to be placed on the job label:

Screenshot of a job label filter

And the PromQL query is run on the job and instance labels:

http_server_request_duration_seconds_count{
   job="webshop/fraud-detection", 
   instance="inst-123"}

Now, let’s look at that same example with resource attribute promotion enabling a native experience, with OpenTelemetry’s service.namespace, service.name, and service.instance.id attributes replacing Prometheus’ job and instance labels.

The dashboard filter is placed on the OpenTelemetry service.namespace and service.name attributes:

Screenshot of filters on the OpenTelemetry service.namespace and service.name attributes

And the PromQL query is run on the OpenTelemetry service.name, service.namespace, and service.instance.id attributes:

http_server_request_duration_seconds_count{
   service_namespace="webshop", 
   service_name="fraud-detection", 
   service_instance_id="inst-123"}

The Grafana Lightweight APM for OpenTelemetry community dashboard provides more examples of these simplified filters and queries.

Always provide deployment details with deployment.environment.name and service.namespace

Always define values for the optional deployment.environment.name and service.namespace attributes to simplify dashboard management and standardize alerts.

Utilize these attributes as follows:

  • deployment.environment.name: The name of the deployment environment (e.g., “production,” “staging”).
  • service.namespace: Used to differentiate service groups and avoid naming conflicts between services from different teams. Team names or domains (e.g., “webshop”) are typical examples.

Enrich promoted attributes

Enhance standard OpenTelemetry resource attributes by incorporating your specific domain attributes, such as organizational details, and utilize them in resource attribute promotion.

Prometheus and Loki alignment

Loki, our open source log aggregation system, also supports resource attribute promotion through the otlp_config/default_resource_attributes_as_index_labels configuration ( here). Maintain consistency between promoted resource attributes in Prometheus metrics and Loki logs for simpler correlations.

Common PromQL queries for OpenTelemetry metrics

The consistency of OpenTelemetry metrics enables standardization and reuse of PromQL queries—a process further simplified by resource attribute promotion. 

This standardization is particularly useful for RED metrics, a set of key metrics used to monitor the health and performance of services. RED stands for:

  • Rate: The number of requests per second
  • Errors: The number of those requests that are failing
  • Duration: The amount of time those requests take (i.e., latency or response time)

Here are some examples of reusable PromQL queries for HTTP server RED metrics that can also be used for HTTP client , RPC, messaging, or database client metrics. 

Request rate

This is the number of requests per second. Here’s the query for request rate aggregated across all HTTP operations:

(sum(rate(
    http_server_request_duration_seconds_count{
        deployment_environment_name=~"$deployment_environment_name", 
        service_namespace=~"$service_namespace", 
        service_name="$service_name"}
    [$__rate_interval]
)) by (deployment_environment_name, service_namespace, service_name))

Next, here it is broken down by HTTP operation:

sum by (operation) (
    label_join(
        rate(http_server_request_duration_seconds_count{
            deployment_environment_name=~"$deployment_environment_name", 
            service_namespace=~"$service_namespace",
            service_name="$service_name"}
        [$__rate_interval]),
        "operation",
        " ",
        "http_request_method",
        "http_route"
    )
)

Error rate

This is the number of requests that are failing per second. Here’s the query for error rate aggregated across all HTTP operations:

(sum by(deployment_environment_name, service_namespace, service_name) 
(rate(http_server_request_duration_seconds_count{
    deployment_environment_name=~"$deployment_environment_name", 
    service_namespace=~"$service_namespace", 
    service_name="$service_name", 
    http_response_status_code=~"5.."}
    [$__rate_interval])) * 100) 
/ 
sum by(deployment_environment_name, service_namespace, service_name) 
(rate(http_server_request_duration_seconds_count{
    deployment_environment_name=~"$deployment_environment_name", 
    service_namespace=~"$service_namespace", 
    service_name="$service_name"}
    [$__rate_interval])
)

And here it is broken down by HTTP operation:

(sum by (operation) (
    label_join(
        rate(http_server_request_duration_seconds_count{
            deployment_environment_name=~"$deployment_environment_name", 
            service_namespace=~"$service_namespace", 
            service_name="$service_name", 
            http_response_status_code=~"5.."}
            [$__rate_interval]),
        "operation",
        " ",
        "http_request_method",
        "http_route"
    )
)
/ 
sum by (operation) (
    label_join(
        rate(http_server_request_duration_seconds_count{
            deployment_environment_name=~"$deployment_environment_name", 
            service_namespace=~"$service_namespace", 
            service_name="$service_name"}
            [$__rate_interval]),
        "operation",
        " ",
        "http_request_method",
        "http_route"
    )
)
) or (0 * 
sum by (operation) (
    label_join(
        rate(http_server_request_duration_seconds_count{
            deployment_environment_name=~"$deployment_environment_name", 
            service_namespace=~"$service_namespace", 
            service_name="$service_name"}
            [$__rate_interval]),
        "operation",
        " ",
        "http_request_method",
        "http_route"
    )
)
)

Duration as 95th percentile and average

Also known as P95, this is the amount of time those requests take (i.e., latency or response time). Here’s the query for P95 and average aggregated across all HTTP operations:

# P95
histogram_quantile(
    0.95, 
    sum by(le, deployment_environment_name, service_namespace, service_name)
    (rate(
        http_server_request_duration_seconds_bucket{
            deployment_environment_name=~"$deployment_environment_name",
            service_namespace=~"$service_namespace",
            service_name="$service_name"}
        [$__rate_interval]
    ))
)

# Average
avg by(deployment_environment_name, service_namespace, service_name) (
    rate(http_server_request_duration_seconds_sum{
        deployment_environment_name=~"$deployment_environment_name", 
        service_namespace=~"$service_namespace", 
        service_name="$service_name"}
    [$__rate_interval])) 
/ 
avg by(deployment_environment_name, service_namespace, service_name) (
    rate(http_server_request_duration_seconds_count{
        deployment_environment_name=~"$deployment_environment_name", 
        service_namespace=~"$service_namespace", 
        service_name="$service_name"}
    [$__rate_interval]))

And here it is broken down by HTTP operation:

P95
histogram_quantile(
    0.95,
    sum by (le, operation) (
        label_join(
        rate(http_server_request_duration_seconds_bucket{
            deployment_environment_name=~"$deployment_environment_name", 
            service_namespace=~"$service_namespace", 
            service_name="$service_name"}[$__rate_interval]),
        "operation",
        " ",
        "http_request_method",
        "http_route"
        )
    )
)

More PromQL queries on popular OpenTelemetry metrics are available on the Grafana Lightweight APM for OpenTelemetry dashboard, including queries on:

  • HTTP metrics ( specifications): http.server.request.duration and http.client.request.duration
  • RPC metrics ( specifications): rpc.server.duration and rpc.client.duration
  • Messaging metrics ( specifications): messaging.client.operation.duration
  • Database metrics ( specifications): db.client.operation.duration

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!