OpenTelemetry with Prometheus: better integration through resource attribute promotion
With the 3.0 release, Prometheus firmly established itself as the leading metrics database for OpenTelemetry. A lot of work has gone into integrating the two open source projects, including a major Prometheus enhancement we’re really excited about: resource attribute promotion.
Resource attribute promotion is enabled by default if you’re a Grafana Cloud user and it can already be leveraged in visualizations like Grafana Metrics Drilldown and with community dashboards like the Lightweight APM for OpenTelemetry dashboard.
Of course, new features come with new best practices. In this blog, we’ll explore:
- The challenges resource attribute promotion addresses
- How it simplifies dashboard creation and metric exploration
- How it facilitates seamless correlation between Prometheus metrics, Grafana Loki logs, and Grafana Tempo traces—or any third-party logs and traces.
Addressing OpenTelemetry-Prometheus integration challenges
There’s a clear need for interoperability between these two projects, as nearly three-quarters of organizations are using both in some capacity, and most are increasing their investments, according to our third annual Observability Survey. But despite Prometheus’s strong OpenTelemetry integration, some challenges remain. These include:
- Dashboard filtering struggles with OpenTelemetry semantic conventions: Creating dashboards filters on core OpenTelemetry attributes like
deployment.environment.name
,service.name
, orservice.namespace
has often required workarounds as these OpenTelemetry attributes were not directly promoted as labels on the metrics. This often involved using the non-OpenTelemetry-standardjob
andinstance
labels or performing complex joins on thetarget_info
metric. - Limited metric exploration: Exploring metrics using slicing and dicing techniques on common dimensions such as
service.version
orcloud.availability_zone
has been cumbersome. Again, this is because these attributes were not promoted as labels on the metrics, requiring complex joins on thetarget_info
metric.
To address these pain points, Prometheus introduced the ability to promote resource attributes. This feature of the Prometheus OTLP endpoint promotes a predefined list of OpenTelemetry resource attributes as metric labels so they’re right at your fingertips when writing queries and building dashboards.
And don’t worry about having to rewrite everything you’ve done in the past. Resource attribute promotion is backward compatible: existing queries, dashboards, and alerts will continue to work without changes. The only caution is for users of metrics aggregation solutions like
Grafana Adaptive metrics, who should update aggregation rules on the instance label and add the resource attributes that identify service instances like service_instance_id
or k8s_pod_name
.
Enabling resource attribute promotion
First, ensure you have adopted Prometheus’s OTLP endpoint to ingest OpenTelemetry metrics, eliminating the metric conversion from OpenTelemetry to Prometheus format within the OpenTelemetry Collector or Grafana Alloy using the Prometheus Remote Write exporter.
Then, activate resource attribute promotion on the Prometheus OTLP endpoint. For Grafana Cloud users, resource attribute promotion is already enabled with the following default list of attributes:
service.name
service.namespace
service.instance.id
service.version
cloud.availability_zone
cloud.region
container.name
deployment.environment
deployment.environment.name
k8s.cluster.name
k8s.container.name
k8s.cronjob.name
k8s.daemonset.name
k8s.deployment.name
k8s.job.name
k8s.namespace.name
k8s.pod.name
k8s.replicaset.name
k8s.statefulset.name
These attributes can also be customized via support tickets.
If you use Prometheus or Grafana Mimir (OSS or
Grafana Enterprise Metrics), you can configure this list using the promote_resource_attributes
configuration block:
otlp:
keep_identifying_resource_attributes: true
promote_resource_attributes:
- service.instance.id
- service.name
- service.namespace
- service.version
- cloud.availability_zone
- cloud.region
- container.name
- deployment.environment
- deployment.environment.name
- k8s.cluster.name
- k8s.container.name
- k8s.cronjob.name
- k8s.daemonset.name
- k8s.deployment.name
- k8s.job.name
- k8s.namespace.name
- k8s.pod.name
- k8s.replicaset.name
- k8s.statefulset.name
Regardless of whether you’re using OSS or Grafana Cloud, we recommend enabling keep_identifying_resource_attributes
to also capture service_name
, service_namespace
, and service_instance_id
on the target_info
metric, which is very handy for dashboard filters (as shown below).
This feature can be enabled following the appropriate steps for your setup:
- On Grafana Cloud, please open a support ticket to activate
keep_identifying_resource_attributes
. - When using Prometheus, follow these instructions.
- When using Mimir, follow these instructions.
Richer community dashboards
Once resource attribute promotion and keep_identifying_resource_attributes
are enabled, you can immediately get richer OpenTelemetry native dashboards, such as the
Lightweight APM for OpenTelemetry dashboard, through the Grafana community.
This dashboard provides a simplified APM experience, including a service selector, RED metrics on inbound HTTP/RPC operations, and RED metrics on outbound HTTP/RPC/database calls, alongside log and trace correlation!

Best practices with resource attribute promotion
Resource attribute promotion unlocks new potential for dashboards and PromQL queries. However, it also requires adjustments to your approach. Next, we’ll walk through some best practices to help you get the most from this functionality.
Fully adopt OpenTelemetry semantic conventions and move away from Prometheus naming conventions.
Resource attribute promotion eliminates the need to use the job
and instance
labels inherited from the Prometheus naming conventions in your dashboards and PromQL queries. This allows full adoption of OpenTelemetry semantic conventions and their attributes: service.namespace
, service.name
, and service.instance.id
.
To illustrate the difference, let’s first look at an example that previously had to mix OpenTelemetry semantic conventions with Prometheus’ job
and instance
labels.
The dashboard filter needs to be placed on the job
label:

And the PromQL query is run on the job
and instance
labels:
http_server_request_duration_seconds_count{
job="webshop/fraud-detection",
instance="inst-123"}
Now, let’s look at that same example with resource attribute promotion enabling a native experience, with OpenTelemetry’s service.namespace
, service.name
, and service.instance.id
attributes replacing Prometheus’ job
and instance
labels.
The dashboard filter is placed on the OpenTelemetry service.namespace
and service.name
attributes:

And the PromQL query is run on the OpenTelemetry service.name
, service.namespace
, and service.instance.id
attributes:
http_server_request_duration_seconds_count{
service_namespace="webshop",
service_name="fraud-detection",
service_instance_id="inst-123"}
The Grafana Lightweight APM for OpenTelemetry community dashboard provides more examples of these simplified filters and queries.
Always provide deployment details with deployment.environment.name
and service.namespace
Always define values for the optional deployment.environment.name
and service.namespace
attributes to simplify dashboard management and standardize alerts.
Utilize these attributes as follows:
deployment.environment.name
: The name of the deployment environment (e.g., “production,” “staging”).service.namespace
: Used to differentiate service groups and avoid naming conflicts between services from different teams. Team names or domains (e.g., “webshop”) are typical examples.
Enrich promoted attributes
Enhance standard OpenTelemetry resource attributes by incorporating your specific domain attributes, such as organizational details, and utilize them in resource attribute promotion.
Prometheus and Loki alignment
Loki, our open source log aggregation system, also supports resource attribute promotion through the otlp_config/default_resource_attributes_as_index_labels
configuration (
here). Maintain consistency between promoted resource attributes in Prometheus metrics and Loki logs for simpler correlations.
Common PromQL queries for OpenTelemetry metrics
The consistency of OpenTelemetry metrics enables standardization and reuse of PromQL queries—a process further simplified by resource attribute promotion.
This standardization is particularly useful for RED metrics, a set of key metrics used to monitor the health and performance of services. RED stands for:
- Rate: The number of requests per second
- Errors: The number of those requests that are failing
- Duration: The amount of time those requests take (i.e., latency or response time)
Here are some examples of reusable PromQL queries for HTTP server RED metrics that can also be used for HTTP client , RPC, messaging, or database client metrics.
Request rate
This is the number of requests per second. Here’s the query for request rate aggregated across all HTTP operations:
(sum(rate(
http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval]
)) by (deployment_environment_name, service_namespace, service_name))
Next, here it is broken down by HTTP operation:
sum by (operation) (
label_join(
rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval]),
"operation",
" ",
"http_request_method",
"http_route"
)
)
Error rate
This is the number of requests that are failing per second. Here’s the query for error rate aggregated across all HTTP operations:
(sum by(deployment_environment_name, service_namespace, service_name)
(rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name",
http_response_status_code=~"5.."}
[$__rate_interval])) * 100)
/
sum by(deployment_environment_name, service_namespace, service_name)
(rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval])
)
And here it is broken down by HTTP operation:
(sum by (operation) (
label_join(
rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name",
http_response_status_code=~"5.."}
[$__rate_interval]),
"operation",
" ",
"http_request_method",
"http_route"
)
)
/
sum by (operation) (
label_join(
rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval]),
"operation",
" ",
"http_request_method",
"http_route"
)
)
) or (0 *
sum by (operation) (
label_join(
rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval]),
"operation",
" ",
"http_request_method",
"http_route"
)
)
)
Duration as 95th percentile and average
Also known as P95, this is the amount of time those requests take (i.e., latency or response time). Here’s the query for P95 and average aggregated across all HTTP operations:
# P95
histogram_quantile(
0.95,
sum by(le, deployment_environment_name, service_namespace, service_name)
(rate(
http_server_request_duration_seconds_bucket{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval]
))
)
# Average
avg by(deployment_environment_name, service_namespace, service_name) (
rate(http_server_request_duration_seconds_sum{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval]))
/
avg by(deployment_environment_name, service_namespace, service_name) (
rate(http_server_request_duration_seconds_count{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}
[$__rate_interval]))
And here it is broken down by HTTP operation:
P95
histogram_quantile(
0.95,
sum by (le, operation) (
label_join(
rate(http_server_request_duration_seconds_bucket{
deployment_environment_name=~"$deployment_environment_name",
service_namespace=~"$service_namespace",
service_name="$service_name"}[$__rate_interval]),
"operation",
" ",
"http_request_method",
"http_route"
)
)
)
Other popular queries
More PromQL queries on popular OpenTelemetry metrics are available on the Grafana Lightweight APM for OpenTelemetry dashboard, including queries on:
- HTTP metrics (
specifications):
http.server.request.duration
andhttp.client.request.duration
- RPC metrics (
specifications):
rpc.server.duration
andrpc.client.duration
- Messaging metrics (
specifications):
messaging.client.operation.duration
- Database metrics (
specifications):
db.client.operation.duration
Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!