Grafana Cloud

Configure client-side service graph and span metrics

By default, the knowledge graph relies on Tempo metrics generation to derive metrics from traces. You can also generate these metrics on the client side using Grafana Alloy.

This page describes how to configure two types of client-side trace-derived metrics optimized for the knowledge graph experience:

  • Service graph metrics—edges between services, including request rate, error rate, and latency.
  • Span metrics—RED (request, error, duration) metrics derived from individual spans.

Before you begin

Before you begin, ensure you have the following:

  • Your services are instrumented with OpenTelemetry and sending traces to Alloy.
  • You have a Prometheus-compatible metrics backend (for example, Grafana Mimir) configured to receive metrics.
  • Tempo server-side metrics generation is disabled to avoid duplicate metrics (refer to disable server-side metric generation for details).

How it works

Service graph metrics

otelcol.connector.servicegraph processes spans from your traces and emits metrics that represent the edges in a service graph. Each metric represents a request between two services (client and server), and includes:

  • Request rate (traces_service_graph_request_total)
  • Error rate (traces_service_graph_request_failed_total)
  • Request duration from client and server perspectives (traces_service_graph_request_client, traces_service_graph_request_server)

The connector must process both sides of a span pair, so all spans of a trace must be processed by the same Alloy instance. If your traces are distributed across multiple Alloy instances, use otelcol.exporter.loadbalancing in front of the instances running otelcol.connector.servicegraph. This ensures span pairs are routed to the same instance. Refer to otelcol.connector.servicegraph for details.

Span metrics

otelcol.connector.spanmetrics aggregates RED (request, error, duration) metrics from individual spans. This helps if your system doesn’t have dedicated Prometheus metrics but has distributed tracing implemented—you get RED metrics from your tracing pipeline.

The connector generates two metrics per span:

  • A counter tracking request totals.
  • A histogram tracking operation duration.

Default dimensions include service.name, span.name, span.kind, and status.code. Additional dimensions can be added from span or resource attributes.

Configure using Grafana Alloy

Use this method if you manage Alloy directly, without the k8s-monitoring Helm chart.

Prerequisite

You have Grafana Alloy installed and configured.

Service graph metrics

The following example receives traces over OTLP, generates service graph metrics, and forwards them to Mimir. The original traces are also forwarded to Tempo.

Alloy
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }

  output {
    traces = [
      otelcol.connector.servicegraph.default.input,
      otelcol.exporter.otlphttp.grafana_cloud_traces.input,
    ]
  }
}

otelcol.connector.servicegraph "default" {
  // Recommended dimensions for Knowledge Graph
  dimensions = [
    "namespace",
    "service.namespace",
    "k8s.cluster.name",
    "k8s.namespace.name",
    "k8s.pod.name",
    "k8s.deployment.name",
    "deployment.environment.name"
  ]
  
  db.name = "db.name"

  output {
    metrics = [otelcol.exporter.prometheus.default.input]
  }
}

otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.mimir.receiver]
}

prometheus.remote_write "mimir" {
  endpoint {
    url = "https://<MIMIR_HOST>/api/prom/push"

    basic_auth {
      username = sys.env("<PROMETHEUS_USERNAME>")
      password = sys.env("<GRAFANA_CLOUD_API_KEY>")
    }
  }
}

otelcol.exporter.otlphttp "grafana_cloud_traces" {
  client {
    endpoint = "https://<TEMPO_HOST>/otlp"
    auth     = otelcol.auth.basic.grafana_cloud_traces.handler
  }
}

otelcol.auth.basic "grafana_cloud_traces" {
  username = sys.env("<TEMPO_USERNAME>")
  password = sys.env("<GRAFANA_CLOUD_API_KEY>")
}

To verify, search for the following metric in Metrics Drilldown against your Mimir data source:

promql
traces_service_graph_request_total

Span metrics

The following example generates span metrics alongside forwarding traces. It adds the recommended labels as extra dimensions, and flushes metrics every 15 seconds.

Alloy
otelcol.receiver.otlp "default" {
  http {}
  grpc {}

  output {
    traces = [
      otelcol.connector.spanmetrics.default.input,
      otelcol.exporter.otlphttp.grafana_cloud_traces.input,
    ]
  }
}

otelcol.connector.spanmetrics "default" {
  // The following are added on top of the default dimensions:
  // service.name, span.name, span.kind, status.code
  dimension { name = "service.namespace" }
  dimension { name = "service.version" }
  dimension { name = "status.message" }
  dimension { name = "k8s.cluster.name" }
  dimension { name = "k8s.pod.name" }
  dimension { name = "k8s.namespace.name" }
  dimension { name = "deployment.environment.name" }
  dimension { name = "db.name" }
  dimension { name = "db.operation" }
  dimension { name = "db.statement" }
  dimension { name = "net.peer.name" }
  dimension { name = "net.peer.port" }
  dimension { name = "http.status.code" }

  histogram {
    unit = "s"
    exponential {}
  }

  metrics_flush_interval = "15s"
  
  output {
    metrics = [otelcol.exporter.prometheus.default.input]
  }
}

otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.mimir.receiver]
}

prometheus.remote_write "mimir" {
  endpoint {
    url = "https://<MIMIR_HOST>/api/prom/push"

    basic_auth {
      username = sys.env("<PROMETHEUS_USERNAME>")
      password = sys.env("<GRAFANA_CLOUD_API_KEY>")
    }
  }
}

otelcol.exporter.otlphttp "grafana_cloud_traces" {
  client {
    endpoint = "https://<TEMPO_HOST>/otlp"
    auth     = otelcol.auth.basic.grafana_cloud_traces.handler
  }
}

otelcol.auth.basic "grafana_cloud_traces" {
  username = sys.env("<TEMPO_USERNAME>")
  password = sys.env("<GRAFANA_CLOUD_API_KEY>")
}

To verify, search for the following metric in Metrics Drilldown against your Mimir data source:

promql
traces_spanmetrics_calls_total

Configure using the Kubernetes Monitoring Helm chart

Use this method if you are using the grafana/k8s-monitoring Helm chart.

Prerequisite

You have the grafana/k8s-monitoring Helm chart deployed with a traces destination configured (for example, Grafana Cloud Tempo).

Service graph metrics

serviceGraphMetrics is configured per destination, nested under the traces destination in destinations. When enabled, the chart automatically deploys a dedicated Alloy StatefulSet with a load balancer in front of it, ensuring span-pair locality.

Add or update the traces destination in your values.yaml:

YAML
destinations:
  - name: grafana-cloud-traces
    type: otlp
    url: <TEMPO_OTLP_URL>
    protocol: grpc
    auth:
      type: basic
      username: "<TEMPO_USERNAME>"
      password: "<GRAFANA_CLOUD_API_KEY>"
    traces:
      enabled: true
    metrics:
      enabled: false
    logs:
      enabled: false
    processors:
      serviceGraphMetrics:
        enabled: true

        # Recommended dimensions for the Knowledge Graph.
        dimensions:
          - namespace
          - service.namespace
          - k8s.cluster.name
          - k8s.namespace.name
          - k8s.pod.name
          - k8s.deployment.name
          - deployment.environment.name

        # Attribute name used to identify the database name from span attributes.
        databaseNameAttribute: "db.name"

To verify, confirm the StatefulSet is running and search for the following metric in Metrics Drilldown:

Bash
kubectl get statefulset -n <NAMESPACE> -l app.kubernetes.io/component=service-graph-metrics
promql
traces_service_graph_request_total

Span metrics

spanMetrics is configured under applicationObservability.connectors, which is the section that enables the application observability pipeline (OTLP receiver + trace processing). By default, it includes service.name, span.name, span.kind, and status.code as dimensions.

Add the following to your values.yaml:

YAML
applicationObservability:
  enabled: true

  receivers:
    otlp:
      grpc:
        enabled: true
        port: 4317
      http:
        enabled: true
        port: 4318

  connectors:
    spanMetrics:
      enabled: true

      # Dimensions added on top of the default set:
      # [service.name, span.name, span.kind, status.code]
      dimensions:
        - name: "service.namespace"
        - name: "service.version"
        - name: "status.message"
        - name: "k8s.cluster.name"
        - name: "k8s.pod.name"
        - name: "k8s.namespace.name"
        - name: "deployment.environment.name"
        - name: "db.name"
        - name: "db.operation"
        - name: "db.statement"
        - name: "net.peer.name"
        - name: "net.peer.port"
        - name: "http.status.code"

      # Dimensions to exclude from the default set.
      excludeDimensions: []

To verify, search for the following metric in Metrics Drilldown against your Mimir data source:

promql
traces_span_metrics_calls_total

Configure using auto-instrumentation with Beyla

If your services are not yet instrumented with OpenTelemetry, you can use Beyla to generate RED metrics and traces without modifying your application code.

When deployed with the grafana/k8s-monitoring Helm chart, Beyla automatically generates metrics in a format compatible with knowledge graph and Application Observability. You can enable it through the autoInstrumentation section of your values.yaml:

YAML
autoInstrumentation:
  enabled: true

The easiest way to get started is through the Kubernetes Monitoring configuration UI, which generates a complete Helm deployment script with the correct settings for your stack.

For more info on this topic, refer to Deploy Beyla with the Kubernetes Monitoring Helm chart.

Troubleshooting

SymptomLikely causeResolution
No service graph or span metrics in MimirMisconfigured Alloy pipelineVerify the output.metrics chain from otelcol.connector.servicegraph or otelcol.connector.spanmetrics to prometheus.remote_write
Duplicate service graph or span metricsTempo server-side metrics generation enabledDisable server-side metrics generation in Tempo
Missing edges in entity graphServices without instrumentation not capturedConfigure virtual_node_peer_attributes