AI observability setup

This guide walks you through setting up the OpenTelemetry-native AI Observability integration to monitor your complete AI stack with distributed tracing, metrics, and logs sent to Grafana Cloud.

Install the AI observability integration

To install the AI Observability integration:

In your Grafana Cloud stack, click Connections in the left-side menu.
Search for AI Observability.
Click the AI Observability card and follow the instructions to instrument your application.
Click Install dashboards to install all 5 specialized dashboards:
- GenAI Observability - Main LLM monitoring dashboard
- GenAI Evaluations - AI quality and safety evaluation dashboard
- VectorDB Observability - Vector database performance monitoring
- MCP Observability - Model Context Protocol monitoring
- GPU Monitoring - Hardware monitoring for GPU infrastructure

Deployment options

The AI Observability instructions use the OTLP Gateway by default for the simplest setup, but you can choose the deployment method that best fits your needs:

OTLP Gateway (Default configuration)

Quick setup - Used by default in AI Observability integration instructions
Direct connection from your application to Grafana Cloud
Minimal infrastructure requirements
Ideal for development, testing, and smaller production deployments

OpenTelemetry Collector / Grafana Alloy (Alternative)

Advanced routing and processing capabilities for complex environments
Better resource management for high-volume applications
Enhanced security and data transformation capabilities

Verify your setup

After completing the setup:

Run your instrumented AI application
The OpenLIT SDK automatically starts sending telemetry data to Grafana Cloud
Navigate to any of the 5 installed dashboards to view your data:
- GenAI Observability - LLM performance and usage metrics
- GenAI Evaluations - AI quality and safety assessments
- VectorDB Observability - Vector database performance
- MCP Observability - Model Context Protocol monitoring
- GPU Monitoring - Hardware performance metrics

Troubleshooting

Common issues

No data appearing in dashboards

Verify OTEL environment variables are set correctly
Check that OpenLIT is initialized
Ensure your application is making AI/ML operations that generate telemetry

Connection errors

Verify your API token has the correct permissions
Check that the OTEL endpoint URL is correct for your Grafana Cloud zone
Ensure network connectivity to Grafana Cloud endpoints

Performance impact

OpenLIT is designed for minimal overhead, but monitor your application performance
Consider using OpenTelemetry Collector for high-volume production workloads
Adjust sampling rates if needed for very high-traffic applications