AI observability setup
This guide walks you through setting up the OpenTelemetry-native AI Observability integration to monitor your complete AI stack with distributed tracing, metrics, and logs sent to Grafana Cloud.
Install the AI observability integration
To install the AI Observability integration:
In your Grafana Cloud stack, click Connections in the left-side menu.
Search for
AI Observability
.Click the AI Observability card and follow the instructions to instrument your application.
Click Install dashboards to install all 5 specialized dashboards:
- GenAI Observability - Main LLM monitoring dashboard
- GenAI Evaluations - AI quality and safety evaluation dashboard
- VectorDB Observability - Vector database performance monitoring
- MCP Observability - Model Context Protocol monitoring
- GPU Monitoring - Hardware monitoring for GPU infrastructure
Deployment options
The AI Observability instructions use the OTLP Gateway by default for the simplest setup, but you can choose the deployment method that best fits your needs:
OTLP Gateway (Default configuration)
- Quick setup - Used by default in AI Observability integration instructions
- Direct connection from your application to Grafana Cloud
- Minimal infrastructure requirements
- Ideal for development, testing, and smaller production deployments
OpenTelemetry Collector / Grafana Alloy (Alternative)
- Advanced routing and processing capabilities for complex environments
- Better resource management for high-volume applications
- Enhanced security and data transformation capabilities
Verify your setup
After completing the setup:
- Run your instrumented AI application
- The OpenLIT SDK automatically starts sending telemetry data to Grafana Cloud
- Navigate to any of the 5 installed dashboards to view your data:
- GenAI Observability - LLM performance and usage metrics
- GenAI Evaluations - AI quality and safety assessments
- VectorDB Observability - Vector database performance
- MCP Observability - Model Context Protocol monitoring
- GPU Monitoring - Hardware performance metrics
Troubleshooting
Common issues
No data appearing in dashboards
- Verify OTEL environment variables are set correctly
- Check that OpenLIT is initialized
- Ensure your application is making AI/ML operations that generate telemetry
Connection errors
- Verify your API token has the correct permissions
- Check that the OTEL endpoint URL is correct for your Grafana Cloud zone
- Ensure network connectivity to Grafana Cloud endpoints
Performance impact
- OpenLIT is designed for minimal overhead, but monitor your application performance
- Consider using OpenTelemetry Collector for high-volume production workloads
- Adjust sampling rates if needed for very high-traffic applications