Introduction

Grafana AI Observability is a comprehensive solution designed to monitor and optimize your generative AI Application. AI Observability combines the OpenLIT SDK to enrich your Grafana Cloud experience.

How does it work?

The Grafana Cloud AI Observability Integration uses the OpenTelemetry-native SDK, OpenLIT, to automatically generate traces and metrics from your AI application. Here is a step-by-step breakdown of how it works:

Integration of OpenLIT SDK:
You integrate the OpenLIT SDK within your AI application. This SDK is designed to support a wide range of Generative AI tools (over 28 including OpenAI, LangChain, Pinecone, and more).
Automatic generation of traces and metrics:
The OpenLIT SDK generates traces and metrics automatically as your application runs. These traces and metrics provide a granular view of the applications internal processes and performance.
Forwarding data to Grafana Cloud:
The traces and metrics can either be directly forwarded to Grafana Cloud from your application or through an intermediate OpenTelemetry-compatible backend like OpenTelemetry Collector or Grafana Alloy. This flexibility allows you to choose the best method based on your infrastructure and scale.
Visualization in Grafana Cloud:
You can install the pre-built GenAI Observability dashboard specifically designed to provide comprehensive insights into your Generative AI stack. This dashboard offers visualizations and analytics that help you monitor the performance, identify issues, and understand the overall behavior of your AI application.

What can be monitored using the AI observability integration?

Grafana AI Observability allows you to monitor the following components:

Large Language Models (LLMs):
Get insights into the performance and behavior of various Large Language Models. Monitor metrics such as response times, error rates, and throughput to ensure optimal performance and reliability of your LLM applications.
Vector databases: Track the operational metrics of vector databases, which are crucial for applications involving similarity searches and other LLM-driven queries. Monitor query performance, resource usage, and other critical metrics.