Linkerd Control Plane

Comprehensive observability dashboard for the Linkerd service mesh control plane. Provides a top-to-bottom view of control plane health — from a single-glance health summary at the top, through internal gRPC and HTTP traffic, endpoint discovery, certificate lifecycle, Kubernetes informer latency, Go runtime resource usage, and Buoyant Enterprise–specific features including external workload autoregistration, controller work queues, and the license controller. Use this dashboard as your primary diagnostic tool whenever the control plane behaves unexpectedly: proxies returning errors, endpoint routing going stale, certificates failing to rotate, or a rollout causing unexpected restarts.

Linkerd Control Plane screenshot 1
Linkerd Control Plane screenshot 2
Linkerd Control Plane screenshot 3
Linkerd Control Plane screenshot 4
Linkerd Control Plane screenshot 5

Template Variables

  • datasource — Prometheus datasource
  • control_ns — Control plane namespace
  • pod — Pod filter (multi-select)

Sections

Control Plane Health — up targets, restarts (24h), oldest uptime, cert expiry days

Mesh: Connected Proxies & Streams — meshed proxy count, active Destination (Get, GetProfile) and Policy (Watch, WatchPort) streams

Control Plane Traffic — request rate and latency (p50/p95/p99) to destination and proxy-injector

Proxy Injector & Destination Cache — admission rate by namespace, cache/subscriber object counts

HTTP Client (Controller → API Server) — request rate and latency by component/method/code

gRPC Server (Controller) — message throughput and handled RPCs by method/code

Identity & Certificates — cert refresh rate, expiry timestamps, issuer TTL

Destination Deep Dive — stream counts over time, update rate, queue overflows, goroutines

Informer Lag — p99 K8s watch latency for EndpointSlice, Pod, Service, ServiceProfile, Server

Identity Deep Dive — Certify RPC rate, active requests, latency

Policy Streams — active WatchPort/Watch streams, gRPC setup latency

Controller Resources — CPU, memory, GC pauses, goroutines, heap objects, alloc rate

Enterprise: External Workloads — registration success/failure, created/reconnected, autoregistration gRPC

Enterprise: Work Queue — depth, add rate, queue/work duration, retries, drops

Enterprise: License Controller — License RPC rate by code, p95 latency

Metrics Sources

  • linkerd-proxy — meshed sidecars
  • linkerd-controller — control plane components
  • kube-state-metrics — restart counts

Note: Linkerd 2.19 renamed control plane admin ports from the shared admin-http to unique per-component names (dest-admin, policy-admin, ident-admin, etc.). If you are running 2.19+, update your Prometheus scrape configs to use the new port names or some panels may not populate. See the 2.19 release notes for details.

Troubleshooting

  • 503s / stale routes → Connected Proxies → Update Queue Overflows
  • Cert/TLS errors → Identity & Certificates → Issuer Cert TTL
  • Slow policy enforcement → Informer Lag → Server Informer Lag P99
  • OOMKilled / high memory → Controller Resources → Controller Memory
  • External workload not joining → External Workloads → Registration Failure Rate
  • Enterprise degraded → License Controller → Handled RPCs
Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies