---
title: "GenAI Evaluations | Grafana Cloud documentation"
description: "Evaluate AI model quality with OpenLIT's built-in hallucination detection, toxicity analysis, and bias assessment"
---

# GenAI evaluations

GenAI Evaluations provides comprehensive monitoring for AI model quality and safety using OpenLIT’s built-in evaluation capabilities for hallucination detection, toxicity analysis, bias assessment, and automated quality scoring.

## Overview

The GenAI Evaluations dashboard focuses on AI model quality and safety evaluation using OpenLIT’s evaluation capabilities, providing:

- **OpenTelemetry-native evaluations** - Built-in metrics collection and monitoring
- **LLM-powered assessments** - AI-driven evaluation using OpenAI or Anthropic models
- **Real-time quality scoring** - Immediate feedback on content quality and safety
- **Comprehensive issue detection** - Detailed categorization and explanations for problems

## Built-in evaluation metrics

### Combined evaluation metric (`openlit.evals.All`)

Comprehensive evaluation that checks for all three risk types in a single call:

- **Hallucination detection** - Identifies factual inaccuracies and false information
- **Bias assessment** - Detects unfair treatment across demographics
- **Toxicity detection** - Flags harmful, offensive, or threatening content

### Specific evaluation metrics

- **`openlit.evals.Hallucination`** - Focused hallucination detection with detailed categorization
- **`openlit.evals.Bias`** - Specialized bias detection across multiple categories
- **`openlit.evals.Toxicity`** - Targeted toxicity assessment with threat analysis

### Evaluation categories

**Hallucination types:**

- `factual_inaccuracy` - Incorrect facts or information
- `nonsensical_response` - Irrelevant or unrelated content
- `gibberish` - Nonsensical text output
- `contradiction` - Conflicting information

**Bias types:**

- `gender`, `age`, `ethnicity`, `religion`, `sexual_orientation`
- `disability`, `physical_appearance`, `socioeconomic_status`

**Toxicity types:**

- `threat`, `hate`, `personal_attack`, `dismissive`, `mockery`

## Supported providers

OpenLIT evaluations support multiple LLM providers for evaluation services:

- OpenAI
- Anthropic

## Key features

### OpenTelemetry integration

- **Native metrics collection** - Built-in OpenTelemetry metrics with `collect_metrics=True`
- **Grafana Cloud compatibility** - Direct metrics export to dashboards
- **Real-time monitoring** - Live evaluation results and trends
- **Custom resource attributes** - Enhanced context and filtering

### Configurable thresholds

- **Custom score thresholds** - Adjust sensitivity per use case
- **Provider flexibility** - Switch between OpenAI and Anthropic models
- **Custom categories** - Add domain-specific evaluation criteria
- **Batch processing** - Efficient evaluation of multiple texts

### Production-ready features

- **Automated verdicts** - “yes/no” determinations based on thresholds
- **Detailed explanations** - Clear reasoning for each evaluation result
- **Score distribution** - Confidence levels from 0.0 to 1.0
- **Custom base URLs** - Support for enterprise API endpoints

## Getting started

[Setup Guide  
\
Set up OpenLIT evaluations for AI model quality and safety monitoring](./setup)

[Configuration  
\
Configure evaluation providers, thresholds, and custom parameters](./configuration)
