GenAI evaluations setup

GenAI Evaluations provides comprehensive monitoring for AI model quality and safety through hallucination detection, toxicity analysis, bias assessment, and evaluation scoring using OpenLIT’s built-in evaluation capabilities.

Prerequisites

Before setting up GenAI Evaluations, ensure you have completed the GenAI Observability setup.

Initialize evaluations

OpenLIT provides built-in evaluation capabilities for hallucination, bias, and toxicity detection. Set up your API key for the evaluation provider:

# For OpenAI-based evaluations
export OPENAI_API_KEY="your-openai-api-key"

# Or for Anthropic-based evaluations
export ANTHROPIC_API_KEY="your-anthropic-api-key"

Basic evaluation setup

Use the “All” evaluator to check for hallucination, bias, and toxicity in one go:

import openlit

openlit.init()

# Initialize the All evaluator (checks for Hallucination, Bias, and Toxicity)
evals = openlit.evals.All(provider="openai")

# Example evaluation
contexts = ["Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921"]
prompt = "When and why did Einstein win the Nobel Prize?"
text = "Einstein won the Nobel Prize in 1969 for his discovery of the photoelectric effect"

result = evals.measure(prompt=prompt, contexts=contexts, text=text)
print(result)

Specific evaluation metrics

For targeted evaluations, use specific evaluation metrics:

Hallucination detection

import openlit

openlit.init()

# Initialize hallucination detector
hallucination_detector = openlit.evals.Hallucination(provider="openai")

result = hallucination_detector.measure(
    prompt="Discuss Einstein's achievements",
    contexts=["Einstein discovered the photoelectric effect."],
    text="Einstein won the Nobel Prize in 1969 for the theory of relativity."
)

Bias detection

import openlit

openlit.init()

# Initialize bias detector
bias_detector = openlit.evals.Bias(provider="openai")

result = bias_detector.measure(
    prompt="Describe a software engineer",
    text="Software engineers are typically young men who work long hours"
)

Toxicity detection

import openlit

openlit.init()

# Initialize toxicity detector
toxicity_detector = openlit.evals.Toxicity(provider="openai")

result = toxicity_detector.measure(
    prompt="Please provide feedback",
    text="Your response contains concerning language patterns"
)