Menu
Grafana Cloud

Introduction

Grafana AI Observability is an OpenTelemetry-native solution designed to monitor and optimize your entire AI stack. AI Observability uses the OpenLIT SDK with distributed tracing capabilities to provide automatic instrumentation for your AI stack.

How does it work?

The Grafana Cloud AI Observability Integration uses the OpenTelemetry-native SDK, OpenLIT SDK, to automatically generate distributed traces and metrics from your AI applications. Here is a step-by-step breakdown of how it works:

  • Integration of OpenLIT SDK:

    You integrate the OpenLIT SDK within your AI application. This SDK is designed to support a wide range of AI workflows and tools.

  • Automatic generation of distributed traces and metrics:

    The OpenLIT SDK generates OpenTelemetry traces and metrics automatically as your application runs.

    Distributed traces provide end-to-end visibility across your AI pipeline, from user requests through LLM calls to vector database queries and GPU operations. Metrics provide granular insights into application performance, costs, and resource utilization.

  • Forwarding data to Grafana Cloud:

    The traces and metrics can either be directly forwarded to Grafana Cloud from your application or through an intermediate OpenTelemetry-compatible backend like OpenTelemetry Collector or Grafana Alloy. This flexibility allows you to choose the best method based on your infrastructure and scale.

  • Visualization in Grafana Cloud:

    You get 5 pre-built specialized dashboards specifically designed to provide comprehensive insights into your Generative AI stack.

    These dashboards offer visualizations and analytics that help you monitor the performance, identify issues, and understand the overall behavior of your AI application across all components of your GenAI stack.

What can be monitored using the AI observability integration?

Grafana AI Observability allows you to monitor the following components

  • Large Language Models (LLMs) and AI agents

    Get insights into the performance and behavior of various Large Language Models. Monitor metrics such as response times, error rates, throughput, token usage, costs, and user interactions to ensure optimal performance and reliability of your LLM applications.

  • AI model quality & safety

    Evaluate AI model outputs for quality and safety with hallucination detection, toxicity analysis, bias assessment, and evaluation scoring. Monitor confidence levels and get detailed explanations for identified issues.

  • Vector databases Track the operational metrics of vector databases, which are crucial for applications involving similarity searches and other LLM-driven queries. Monitor query performance, response times, success rates, and resource usage across different services and environments.

  • Model Context Protocol (MCP) Monitor MCP usage analytics, tool performance, transport types, client distribution, method call patterns, and payload analysis. Track health metrics and failure patterns for robust protocol monitoring.

  • GPU infrastructure Monitor hardware performance including GPU utilization, temperature, memory usage, fan speeds, and performance metrics across your GPU infrastructure to ensure optimal resource allocation and prevent hardware issues.