
Monitor Model Context Protocol (MCP) servers with OpenLIT and Grafana Cloud
Note: The world is changing all around us thanks to AI. Today, anyone and everyone can be a developer, using LLMs to create LLM-powered applications, which users can then interact with by using even more LLMs.
Observability practitioners need to adapt and they need the right tools for the job. In this series, we'll show you how to use Grafana Cloud to monitor AI applications, including workloads in production, AI agents, MCP servers (this post), and zero-code LLMs.
Large language models don’t work in a vacuum. They often rely on Model Context Protocol (MCP) servers to fetch additional context from external tools or data sources.
MCP provides a standard way for AI agents to talk to tool servers, but this extra layer introduces complexity. Without visibility, an MCP server becomes a black box: you send a request and hope a tool answers. When something breaks, it’s hard to tell if the agent, the server or the downstream API failed.
In this guide, you'll learn how to instrument MCP servers using OpenLIT and how to analyze those servers in Grafana Cloud.
Why MCP observability matters
In an agentic system, an MCP server may route tool calls to multiple services. Observability helps you answer critical questions about:
- Latency spikes: When a tool is slow to respond, user experience suffers. By examining request throughput and the 95th/99th percentile latency distributions, you can determine whether a downstream API or the MCP layer is responsible.
- Silent failures: For example, a tool returning partial data or timing out often goes unnoticed without structured telemetry. End‑to‑end tracing across the agent, MCP server, and external tools provides the full context needed to diagnose these issues.
- Cross‑service visibility: This is important because MCP calls cross-network and language boundaries. For example, OpenTelemetry propagates context, so spans started in a Python client link seamlessly to spans in a Node.js tool server, producing a coherent trace across systems.
- Context window usage: Resource consumption can grow quickly as agents query more tools. By tracking context window usage and memory consumption, you can right‑size your MCP servers and avoid over‑allocating resources.
AI Observability in Grafana Cloud supports MCP out of the box. The solution includes pre‑built dashboards for tool performance, protocol health, resource usage, and error tracking.
Benefits of MCP observability in Grafana Cloud
Observing your MCP servers unlocks a range of advantages:
- End‑to‑end tracing shows the entire path of a request—from the agent through the MCP server to each tool call—so you can pinpoint bottlenecks and failures.
- Detailed performance metrics like
tool_invocation_duration_msand invocation counts, help you identify slow or overused tools and adjust resource allocation accordingly. - Scalability and cost control are enabled through context‑window and memory usage telemetry, so you can right‑size servers and avoid over‑provisioning. Because OpenTelemetry uses an open, vendor‑neutral format, your instrumentation remains portable; you can route data to Grafana, a self‑hosted OTLP stack or any other backend without code changes.
- Security and compliance is also available with MCP monitoring by auditing tool interactions and ensuring protocols are used as intended.
How to monitor your MCP server with Grafana Cloud
Next, let's take a high-level look at how you can use Grafana Cloud to observe your MCP server, then we'll walk through the setup process so you can get up and running today.
And if you get stuck anywhere along the way or need help with your own setup, click on the pulsar icon in the top-right corner of the Grafana Cloud UI to open a chat with Grafana Assistant, our purpose-built LLM that can help troubleshoot incidents, manage dashboards, and answer product questions.
Architecture overview
The diagram below illustrates how agent interactions with an MCP server are instrumented and visualized.
The agent or client calls the MCP server to execute tools. OpenLIT instruments both the client and the server, capturing spans for context management, tool selection, and tool execution. These traces and metrics are exported to Grafana Cloud, where pre‑built dashboards provide insight into performance and failures.

The workflow consists of five key components:
- Agent/client: AI agents use the MCP protocol to invoke tools hosted on external servers.
- MCP server: Hosts one or more tools (e.g., search, database query). The server handles context loading, manages tool state, and responds to requests.
- External tools: Actual services (databases, APIs) that do the work. They may be local or remote.
- OpenLIT instrumentation: A single
openlit.init()call instruments on both the client and server; context interactions, and tool executions generate OpenTelemetry spans. - Grafana Cloud: Collected traces and metrics flow into Grafana Cloud’s fully managed Prometheus and Tempo backends, where specialized MCP dashboards offer visibility into protocol usage.
Step 1: Install the AI Observability solution
Start by adding the AI Observability integration to your Grafana Cloud stack. This can be done by clicking on Connections in the left-side menu and following the remaining steps outlined in our documentation.
This provision pre‑built dashboards, including one for MCP observability, and configured a managed OpenTelemetry Protocol (OTLP) gateway to receive traces and metrics. Once telemetry flows in, the dashboards automatically populate with call rates, latency percentiles, and error counts.
Step 2: Install OpenLIT and the MCP library
OpenLIT provides auto‑instrumentation for MCP alongside LLMs, vector stores, and agent frameworks. Install OpenLIT and the mcp library (which implements the client and server) via pip:
pip install openlit mcp
After installation, a single call to openlit.init() automatically instruments all MCP operations. If you choose to run your own telemetry collector instead of Grafana’s OTLP gateway, OpenLIT can be self‑hosted via Docker Compose or deployed to Kubernetes using the OpenLIT Operator.
Step 3: Instrument your MCP application
Instrumentation requires just two lines of code. Below is a simple example of an MCP server that exposes a search_documents tool. OpenLIT instruments the server, capturing each tool invocation and context interaction:
import openlit
from mcp import Server
openlit.init() # enable OpenTelemetry tracing and metrics
# Create an MCP server instance
server = Server()
# Define a tool to fetch documents
@server.tool("search_documents")
def search_documents(query: str):
# Imagine this function calls a search API or database
results = document_search(query)
return results
# Run the MCP server on localhost
server.run(host="localhost", port=8080)
# When a client invokes search_documents, OpenLIT captures:
# * Context protocol interactions (e.g., context loading and management)
# * Tool usage metrics (latency and success rate)
# * Protocol handshake performance
# * Resource usage (context window size, memory)
# * Errors or exceptions:contentReference[oaicite:14]{index=14}
To instrument an MCP client, use the Client class from the mcp library and call openlit.init() before making requests:
import openlit
from mcp import Client
openlit.init()
client = Client("http://localhost:8080")
tools = client.list_tools() # Lists available tools
result = client.call_tool(
"search_documents",
{"query": "AI observability"}
) # Invokes the tool
# All client operations are automatically instrumented:contentReference[oaicite:15]{index=15}
OpenLIT supports zero‑code instrumentation via a CLI wrapper. To instrument an existing MCP service without code modifications, use:
openlit-instrument \
--service-name my-mcp-app \
python your_mcp_app.py
# With custom settings:
openlit-instrument \
--otlp-endpoint http://127.0.0.1:4318 \
--service-name my-mcp-app \
--environment production \
python your_mcp_app.py
export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp-gateway-<ZONE>.grafana.net/otlp"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic <BASE64>"
export OTEL_SERVICE_NAME="mcp-server-demo"
export OTEL_DEPLOYMENT_ENVIRONMENT="production"
When you run your client or server with these variables, the OpenLIT SDK automatically sends telemetry data to Grafana Cloud.
Step 5: Explore the MCP observability dashboard
After you start your instrumented MCP client and server, open Grafana Cloud and navigate to AI Observability → MCP Observability. The dashboard provides:
- Tool performance: Call latency histograms, success rates, and invocation counts per tool.
- Protocol health: Session stability and connection metrics to detect handshake issues.
- Resource usage: Context window size, memory, and data access patterns, helping you optimize server resources.
- Error tracking: Lists failed operations with trace IDs and detailed exception information to aid debugging
You can build custom dashboards by querying Prometheus metrics (e.g., tool invocation duration) and Tempo traces. Because OpenLIT uses OpenTelemetry, you’re not locked into a single backend. You can forward telemetry to any OTLP‑compatible observability stack.
Next steps
Ready to learn more? In the final blog in this series, we’ll show how to set this up, step by step, for a zero-code instrumentation approach to AI Observability.
You can also learn more about Grafana Cloud AI Observability in the official docs, including setup instructions and dashboards. These resources will help you move from a basic demo to a production-ready setup for your AI applications.
Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!


