Menu
Grafana Cloud

AI observability setup

This guide walks you through setting up the OpenTelemetry-native AI Observability integration to monitor your complete AI stack with distributed tracing, metrics, and logs sent to Grafana Cloud.

Install the AI observability integration

To install the AI Observability integration:

  1. In your Grafana Cloud stack, click Connections in the left-side menu.

  2. Search for AI Observability.

  3. Click the AI Observability card and follow the instructions to instrument your application.

  4. Click Install dashboards to install all 5 specialized dashboards:

    • GenAI Observability - Main LLM monitoring dashboard
    • GenAI Evaluations - AI quality and safety evaluation dashboard
    • VectorDB Observability - Vector database performance monitoring
    • MCP Observability - Model Context Protocol monitoring
    • GPU Monitoring - Hardware monitoring for GPU infrastructure

Deployment options

The AI Observability instructions use the OTLP Gateway by default for the simplest setup, but you can choose the deployment method that best fits your needs:

OTLP Gateway (Default configuration)

  • Quick setup - Used by default in AI Observability integration instructions
  • Direct connection from your application to Grafana Cloud
  • Minimal infrastructure requirements
  • Ideal for development, testing, and smaller production deployments

OpenTelemetry Collector / Grafana Alloy (Alternative)

  • Advanced routing and processing capabilities for complex environments
  • Better resource management for high-volume applications
  • Enhanced security and data transformation capabilities

Verify your setup

After completing the setup:

  1. Run your instrumented AI application
  2. The OpenLIT SDK automatically starts sending telemetry data to Grafana Cloud
  3. Navigate to any of the 5 installed dashboards to view your data:
    • GenAI Observability - LLM performance and usage metrics
    • GenAI Evaluations - AI quality and safety assessments
    • VectorDB Observability - Vector database performance
    • MCP Observability - Model Context Protocol monitoring
    • GPU Monitoring - Hardware performance metrics

Troubleshooting

Common issues

No data appearing in dashboards

  • Verify OTEL environment variables are set correctly
  • Check that OpenLIT is initialized
  • Ensure your application is making AI/ML operations that generate telemetry

Connection errors

  • Verify your API token has the correct permissions
  • Check that the OTEL endpoint URL is correct for your Grafana Cloud zone
  • Ensure network connectivity to Grafana Cloud endpoints

Performance impact

  • OpenLIT is designed for minimal overhead, but monitor your application performance
  • Consider using OpenTelemetry Collector for high-volume production workloads
  • Adjust sampling rates if needed for very high-traffic applications