MCP Server Observability

Comprehensive observability for MCP servers: transport, protocol, tool execution, agentic metrics, sessions, resources, and system health. Supports multi-client monitoring.

MCP Server Observability screenshot 1
MCP Server Observability screenshot 2
MCP Server Observability screenshot 3
MCP Server Observability screenshot 4

MCP Server Observability Dashboard

Comprehensive monitoring dashboard for Model Context Protocol (MCP) servers — the open standard enabling AI assistants to invoke tools, access resources, and interact with external systems.

What This Dashboard Shows

Organized into 10 sections covering the full MCP observability stack:

SectionKey Panels
OverviewTotal tool calls, error rate gauge, active sessions, in-flight concurrency, handshake success rate
Multi-Client BreakdownPer-client call rates, active sessions by client
Transport LayerConnection distribution by transport type (streamable-http, SSE, WebSocket), reconnection events, handshake success/failure
Protocol & MessagesMessage rates by JSON-RPC method, request-response latency (p50/p95/p99), JSON-RPC error codes, protocol version distribution, message size
Tool ExecutionInvocation rate per tool, execution time percentiles, success vs error rate, error type classification, concurrent in-flight calls, result payload sizes
Agentic & Token MetricsToken usage (input/output) per tool, hallucination detection (calls to non-existent tools), prompt template usage, self-correction rate, turns-to-completion, task success rate
Rate LimitingThrottle events by client/tool, currently rate-limited clients
Resource AccessTop accessed resources, response sizes, error rates, access anomaly detection
Sessions & ConnectionsSession duration percentiles, disconnect reasons by client
System ResourcesCPU and memory usage of the MCP server process

Dashboard variables allow filtering by client_id and tool_name.

How Metrics Are Collected

Metrics collection: No external collector/agent. The MCP server exposes Prometheus metrics directly at /metrics using prometheus_client. Prometheus scrapes this endpoint every 5s.

Distributed tracing: OpenTelemetry SDK embedded in the application creates spans for every tool call, exported via OTLP gRPC to Jaeger (or any OTLP-compatible backend like Tempo/Grafana Cloud)

Data Source

Type: Prometheus Optional: Tempo data source for tracing

Requirements

Grafana 9.0+ Prometheus 2.40+

Full source code and instrumentation library: https://github.com/tyagian/mcp_server_observability

Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies