Instrument zero‑code observability for LLMs and agents on Kubernetes

Instrument zero‑code observability for LLMs and agents on Kubernetes

2026-03-208 min
Twitter
Facebook
LinkedIn

Note: The world is changing all around us thanks to AI. Today, anyone and everyone can be a developer, using LLMs to create LLM-powered applications, which users can then interact with by using even more LLMs. 

Observability practitioners need to adapt and they need the right tools for the job. In this series, we'll show you how to use Grafana Cloud to monitor AI applications, including workloads in production, AI agents, MCP servers, and zero-code LLMs (this post).

Building AI services with large language models and agentic frameworks often means running complex microservices on Kubernetes. Observability is vital, but instrumenting every pod in a distributed system can quickly become a maintenance nightmare. 

OpenLIT Operator solves this problem by automatically injecting OpenTelemetry instrumentation into your AI workloads—no code changes or image rebuilds required. When combined with AI Observability in Grafana Cloud, you can monitor costs, latency, token usage, and agent workflows across your entire cluster in minutes.

In this final post in our AI Observability series, we'll show you how to easily get started by combining OpenLIT Operator and Grafana Cloud to enable zero-code observability for your AI workloads.

Why zero‑code instrumentation matters

Traditional observability relies on developers adding instrumentation libraries to their application code. But in the fast‑moving world of generative AI, your stack might include multiple model providers, agent frameworks, vector databases, and custom tools. Keeping instrumentation up to date across all these components is a burden. 

The OpenLIT Operator brings zero‑code AI observability to Kubernetes. It automatically injects and configures OpenTelemetry instrumentation into your pods, producing distributed traces and metrics without any code changes. Because it is built on OpenTelemetry standards, it integrates with existing observability infrastructure and allows you to switch between providers (OpenLIT, OpenInference, OpenLLMetry, custom) without redeploying your applications.

This zero‑code approach is designed specifically for AI workloads. It provides seamless observability for LLMs, vector databases, and agentic frameworks running in Kubernetes. You can track token usage, monitor agent workflows, measure response times, and debug AI framework interactions—all without touching your code.

Benefits of zero‑code observability in Grafana Cloud

There are also multiple reasons why you should use zero-code observability in Grafana Cloud.

  • Rapid onboarding: Deploy the OpenLIT Operator once and instrument all your AI workloads without modifying a single line of code.
  • Comprehensive coverage: The operator supports major LLM providers, vector databases, and agent frameworks, and can be extended to other providers through its plugin architecture.
  • Vendor neutrality: Built on OpenTelemetry, the operator allows you to send telemetry to Grafana Cloud, a self‑hosted OpenTelemetry collector, or any OTLP‑compatible backend.
  • Cost and performance insights: Distributed traces capture token usage, cost, latency, and agent step sequences, enabling you to optimise model selection and resource allocation.

How to set up zero-code observability for AI applications in Grafana Cloud

Now that we've covered why you should be using Grafana Cloud for zero-code observability, let's look at how you can make that happen, starting with a high-level explanation of the workflow, followed by step-by-step instructions for getting started quickly.

And if you get stuck anywhere along the way or need help with your own setup, click on the pulsar icon in the top-right corner of the Grafana Cloud UI to open a chat with Grafana Assistant, our purpose-built LLM that can help troubleshoot incidents, manage dashboards, and answer product questions.

Architecture overview

AI applications like LLMs and agents run inside pods in your Kubernetes cluster. The OpenLIT Operator continuously monitors these pods and checks them against your instrumentation policies. When it finds a matching pod, it automatically injects an init container that sets up OpenTelemetry instrumentation, enabling observability without requiring manual changes to your application code.

Telemetry is sent to an OpenLIT collector or directly to Grafana Cloud’s OpenTelemetry Protocol (OTLP) gateway. The AI Observability dashboards in Grafana Cloud then visualize latency, cost, and quality metrics.

The workflow consists of four key pieces:

  1. AI workloads: Pods running LLMs, vector DBs, or agent frameworks such as LangChain, CrewAI, or OpenAI Agents. The operator supports a wide range of LLM providers (OpenAI, Anthropic, Google, AWS Bedrock, Mistral) and frameworks (LangChain, LlamaIndex, CrewAI, Haystack, DSPy, and more).
  2. OpenLIT Operator: A Kubernetes operator that injects OpenTelemetry instrumentation into selected pods based on label selectors. The operator is OpenTelemetry‑native and allows you to switch providers without changing your application code.
  3. OpenLIT collector: Collects traces and metrics from instrumented pods. You can run it in‑cluster via Helm or send telemetry directly to Grafana Cloud’s OTLP endpoint.
  4. Grafana Cloud: Stores traces in Tempo and metrics in Prometheus through our fully managed platform. Our AI observability solution provides pre‑built dashboards for GenAI, vector DBs, agents, and Model Context Protocol (MCP), allowing you to explore latency percentiles, token and cost metrics, agent step sequences, and evaluation results.

Step 1: Add the AI Observability integration

Before instrumenting your cluster, add the AI Observability integration to your Grafana Cloud stack. This can be done by clicking on Connections in the left-side menu and following the steps outlined in our documentation.

This provisions dashboards and sets up a managed OTLP gateway for receiving your traces and metrics. Once telemetry arrives, the dashboards populate automatically with request rates, latency distributions, and cost summaries.

Step 2: Prepare your Kubernetes environment

To follow this guide, you’ll need a Kubernetes cluster with cluster‑admin privileges, Helm, and kubectl configured. If you don’t have a cluster, you can create one locally using k3d or minikube. For a quick test drive, create a cluster with:

k3d cluster create openlit-demo

Step 3: Deploy OpenLIT Operator

First add the OpenLIT Helm repository and update your charts:

helm repo add openlit https://openlit.github.io/helm/
helm repo update

Install the OpenLIT Operator to enable zero‑code instrumentation:

helm install openlit-operator openlit/openlit-operator

Verify that the operator pod is running:

kubectl get pods -n openlit -l app.kubernetes.io/name=openlit-operator

You should see the operator in a Running state.

Step 4: Create an AutoInstrumentation resource

The AutoInstrumentation custom resource defines which pods to instrument and how to configure the injected instrumentation. It specifies label selectors to target your AI applications, the instrumentation provider (OpenLIT by default), and the OTLP endpoint to send telemetry.

Here is a minimal example that instruments pods labeled instrumentation=openlit and sends data to Grafana Cloud:

apiVersion: openlit.io/v1alpha1
kind: AutoInstrumentation
metadata:
  name: grafana-observability
  namespace: default
spec:
  selector:
    matchLabels:
      instrumentation: openlit
  python:
    instrumentation:
      enabled: true
  otlp:
    endpoint: "https://otlp-gateway-<REGION>.grafana.net/otlp"  # Grafana OTLP gateway
    headers:
      Authorization: "Basic <BASE64>"  # Replace with base64‑encoded instanceID:token
  resource:
    attributes:
      deployment.environment: "production"
      service.namespace: "ai-services"

Apply the manifest:

kubectl apply -f autoinstrumentation.yaml

Already have AI applications running? Restart the pods that match your selector to pick up the injected instrumentation:

kubectl rollout restart deployment your-deployment-name

When the pods restart, the OpenLIT Operator automatically injects an init container that configures Python instrumentation. The pods begin emitting distributed traces with LLM costs, token usage, and agent performance metrics.

Step 5: Deploy your AI application (no code changes)

You can now deploy or continue running your AI workloads normally. Whether you’re using OpenAI Agents SDK, CrewAI, LangChain, or a custom Python service, you don’t need to modify your code. The operator recognizes supported frameworks and model providers, and it instruments them transparently. 

For example, a simple deployment of a CrewAI‑based chatbot can be launched via a Kubernetes Deployment; the operator will detect and instrument all LLM and agent calls as soon as the pod starts. The instrumentation captures the sequence of agent steps, tool invocations, and model responses, along with latency and token metrics.

Step 6: Visualize metrics and traces in Grafana

With your pods instrumented and telemetry flowing to Grafana Cloud, open the AI Observability dashboards. 

The GenAI observability dashboard shows request rates, p95/p99 latencies, and cost metrics across different providers. The GenAI observability dashboard surfaces agent workflows, step durations, and tool success rates. The Vector DB and MCP dashboards provide context on database queries and protocol health. 

Because OpenLIT’s traces include LLM costs and token counts, Grafana can also estimate costs and highlight expensive calls. In the dashboard, you’ll see a service overview, individual traces for HTTP requests and OpenAI API calls, detailed spans with token usage, performance metrics (response times, error rates, throughput), and cost tracking.

Grafana’s alerting engine can trigger notifications when latency spikes, error rates increase, or token usage exceeds budget. Since the telemetry is OpenTelemetry‑native, you can build custom panels and alerts on top of Prometheus metrics and Tempo traces.

Next steps

You can also learn more about Grafana Cloud AI Observability in the official docs, including setup instructions and dashboards. You can also check out the first post in this series to see a full demo to better understand how to monitor AI workloads in Grafana Cloud, or check out our other AI blogs, including posts about our own LLM: Grafana Assistant.  of setting up a or check out our other blog posts

Taken collectively, these resources will help you move from a basic demo to a production-ready setup for your AI applications.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Tags

Related content