Menu
Grafana Cloud

Introduction

Grafana AI Observability is an observability integration for the complete AI stack that’s built using the OpenTelemetry-native auto-instrumentation library, OpenLIT.

Grafana AI Observability helps AI Engineers to improve the performance of their AI Application.

  • User interactions: Gain deep insights into user interactions with LLMs, capturing prompts, and completions to thoroughly understand user intent and model performance.
  • Token usage: Track and visualize token usage for every interaction, providing actionable data to optimize resource allocation and maintain cost efficiency.
  • Cost monitoring: Monitor and analyze cost utilization associated with LLMs in real time, enabling effective budget management, forecasting, and cost-saving strategies.
  • Metadata capture: Capture and dissect comprehensive metadata for each LLM request, including request parameters, response times, model versions, and other details to enhance overall system understanding.
  • Request latency: Track the latency of each request to ensure optimal performance, identify bottlenecks, and enable prompt issue resolution.
  • Vector database performance: Monitor the performance of your vector database query response times and throughput to ensure efficient processing and retrieval of vector data queries, maintaining robust data handling capabilities.

Get started

To get started with AI Observability follow these steps:

  1. Instrument your application
  2. Create a free Grafana Cloud account
  3. Configure your telemetry data destination:
    1. OTLP gateway: for a quick local development and testing setup
    2. OpenTelemetry Collector: for a robust and scalable production setup
  4. Observe your AI Stack using a pre-built Grafana dashboard

How does it work?

The Grafana Cloud AI Observability Integration uses the OpenTelemetry-native SDK, OpenLIT, to automatically generate traces and metrics from your AI application. Here is a step-by-step breakdown of how it works:

  • Integration of OpenLIT SDK:

    You integrate the OpenLIT SDK within your AI application. This SDK is designed to support a wide range of Generative AI tools (over 28 including OpenAI, LangChain, Pinecone, and more).

  • Automatic Generation of Traces and Metrics:

    The OpenLIT SDK generates traces and metrics automatically as your application runs. These traces and metrics provide a granular view of the applications internal processes and performance.

  • Forwarding Data to Grafana Cloud:

    The traces and metrics can either be directly forwarded to Grafana Cloud from your application or through an intermediate OpenTelemetry-compatible backend like OpenTelemetry Collector or Grafana Alloy. This flexibility allows you to choose the best method based on your infrastructure and scale.

  • Visualization in Grafana Cloud:

    You can install the pre-built GenAI Observability dashboard specifically designed to provide comprehensive insights into your Generative AI stack. This dashboard offers visualizations and analytics that help you monitor the performance, identify issues, and understand the overall behavior of your AI application.

What can be monitored using the AI observability integration?

Grafana AI Observability allows you to monitor the following components:

  • Large Language Models (LLMs):

    Get insights into the performance and behavior of various Large Language Models. Monitor metrics such as response times, error rates, and throughput to ensure optimal performance and reliability of your LLM applications.

  • Vector databases: Track the operational metrics of vector databases, which are crucial for applications involving similarity searches and other LLM-driven queries. Monitor query performance, resource usage, and other critical metrics.