Introduction

Grafana Cloud

Introduction to Grafana AI Observability

Grafana AI Observability is an observability platform for teams that run LLM agents in production. It captures every generation your agents make, organizes them into conversations, tracks agent versions, and lets you evaluate quality continuously.

This article explains the core concepts you need to understand before you instrument your agents and deploy AI Observability.

Generations

A generation is the core unit of data in AI Observability. Each time your agent calls an LLM provider, the SDK captures the request and response as a generation. A generation includes:

The model provider and name, for example, openai/gpt-4o.
Input messages (system prompt, user messages, tool results).
Output messages (assistant responses, tool calls).
Token usage (input, output, cache read, cache write, reasoning).
Timing data (request start, first token, completion).
Optional metadata and tags.

Generations can be synchronous or streaming. The SDK handles both modes transparently.

Conversations

AI Observability groups generations by conversation_id into conversations. A conversation represents a full interaction thread between a user and one or more agents.

In the AI Observability plugin, you can browse conversations, filter by time range, search by content, and drill into individual generations. Each conversation shows a timeline view with traces, token usage, cost breakdown, and quality scores.

Framework integrations

AI Observability provides framework integrations for LangChain, LangGraph, OpenAI Agents, LlamaIndex, Google ADK, and Vercel AI SDK. These integrations attach callbacks or hooks that capture generations automatically, so you don’t need to instrument each LLM call manually.

Framework integrations are available for Python, TypeScript, Go, and Java. Refer to Instrument agents with frameworks for setup details.

Agent catalog

AI Observability automatically discovers and catalogs your agents. When you set an agent_name in your SDK calls, AI Observability tracks that agent by name and effective version.

The agent catalog shows all active agents, their versions, associated models, and usage patterns. By default, AI Observability computes versions from the system prompt; SDKs can send an effective version when they need a stable catalog identity across turns.

Multi-agent dependency tracking

For pipelines where multiple agents cooperate, AI Observability supports dependency tracking through the parent_generation_ids field on generations. When one agent’s output feeds into another, you can declare the parent generation IDs. AI Observability builds a dependency DAG and propagates quality signals — if an upstream generation fails evaluation, downstream dependents are flagged.

The conversation detail view includes a Dependencies tab for the generation DAG. The tab is available for every conversation; if no parents are declared, it shows the generations as independent nodes.

Workflow steps

Workflow steps describe non-LLM execution nodes in an agentic workflow, for example, routing, planning, retrieval, or tool orchestration steps. Use workflow steps when you want the conversation detail view to show the workflow structure around one or more linked LLM generations.

Workflow steps are separate from generations. Generations capture LLM calls, token usage, model details, and costs. Workflow steps capture execution state, duration, tags, errors, and parent-child workflow edges. When a conversation includes workflow-step telemetry, the detail view also shows a Workflow tab.

Online evaluation

Online evaluation lets you score live production traffic continuously. You configure rules that match specific generation patterns, then attach evaluators that run automatically.

AI Observability supports four evaluator types:

LLM judge uses a separate LLM to score responses based on criteria you define.
JSON schema validates that responses match an expected structure.
Regex checks responses against patterns.
Heuristic applies rule trees (length checks, content checks, emptiness checks).

Evaluation results appear as scores on conversations and generations in the plugin UI. You can also create Grafana alert rules directly from evaluation rules to be notified when pass rates drop below a threshold.

OpenTelemetry integration

AI Observability is built on OpenTelemetry. The SDKs emit standard gen_ai.* semantic convention spans and metrics alongside generation data. Your existing OTel infrastructure (Alloy, collectors, Tempo, Prometheus) handles these traces and metrics out of the box. Generation data is sent separately to the AI Observability API.

Key metrics emitted include operation duration, token usage, time to first token, and tool calls per operation.

Data flow

Your application sends data to AI Observability through two paths:

Generation export: The SDK sends structured generation data to the AI Observability API over HTTP or gRPC.
OTLP telemetry: The SDK emits OpenTelemetry traces and metrics via OTLP. You can send these directly to the Grafana Cloud OTLP gateway or through a local Alloy/OTel Collector that forwards to Tempo and Prometheus.

Note
Your application must configure OTel TracerProvider and MeterProvider for path 2 to work. Without this setup, SDK-emitted traces and metrics are silently lost. Refer to Set up traces and metrics for configuration details.

AI Observability stores recent generation data in MySQL for fast queries and compacts older data to object storage (S3, GCS, Azure Blob, or MinIO) for long-term retention.

Multi-tenancy

AI Observability enforces tenant isolation using the X-Scope-OrgID header. Each tenant’s data is fully separated. The SDK auth configuration determines how the tenant header is set.

Plugin UI

The AI Observability plugin provides these views:

Analytics: Dashboards for activity, latency, errors, tokens, cost, and cache behavior.
Conversations: Browse and search conversations with full generation drilldown and annotations.
Agents: Agent catalog with version history and tool/prompt footprints.
Evaluation: Configure and monitor online evaluation rules, evaluators, and scores.

AI Observability is also registered in the Grafana command palette, so you can quickly navigate to any AI Observability page by pressing Cmd+K (Mac) or Ctrl+K (Linux/Windows).

Next steps

Was this page helpful?

Email docs@grafana.com

Help and support

Community

Introduction to Grafana AI Observability

Generations

Conversations

Framework integrations

Agent catalog

Multi-agent dependency tracking

Workflow steps

Online evaluation

OpenTelemetry integration

Data flow

Multi-tenancy

Plugin UI

Next steps

Was this page helpful?

Still have questions?

Get every update

Introduction to Grafana AI Observability

Generations

Conversations

Framework integrations

Agent catalog

Multi-agent dependency tracking

Workflow steps

Online evaluation

OpenTelemetry integration

Data flow

Multi-tenancy

Plugin UI

Next steps

Was this page helpful?

Related resources from Grafana Labs