GenAI evaluations setup
GenAI Evaluations provides comprehensive monitoring for AI model quality and safety through hallucination detection, toxicity analysis, bias assessment, and evaluation scoring using OpenLIT’s built-in evaluation capabilities.
Prerequisites
Before setting up GenAI Evaluations, ensure you have completed the GenAI Observability setup.
Initialize evaluations
OpenLIT provides built-in evaluation capabilities for hallucination, bias, and toxicity detection. Set up your API key for the evaluation provider:
# For OpenAI-based evaluations
export OPENAI_API_KEY="your-openai-api-key"
# Or for Anthropic-based evaluations
export ANTHROPIC_API_KEY="your-anthropic-api-key"
Basic evaluation setup
Use the “All” evaluator to check for hallucination, bias, and toxicity in one go:
import openlit
openlit.init()
# Initialize the All evaluator (checks for Hallucination, Bias, and Toxicity)
evals = openlit.evals.All(provider="openai")
# Example evaluation
contexts = ["Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921"]
prompt = "When and why did Einstein win the Nobel Prize?"
text = "Einstein won the Nobel Prize in 1969 for his discovery of the photoelectric effect"
result = evals.measure(prompt=prompt, contexts=contexts, text=text)
print(result)
Specific evaluation metrics
For targeted evaluations, use specific evaluation metrics:
Hallucination detection
import openlit
openlit.init()
# Initialize hallucination detector
hallucination_detector = openlit.evals.Hallucination(provider="openai")
result = hallucination_detector.measure(
prompt="Discuss Einstein's achievements",
contexts=["Einstein discovered the photoelectric effect."],
text="Einstein won the Nobel Prize in 1969 for the theory of relativity."
)
Bias detection
import openlit
openlit.init()
# Initialize bias detector
bias_detector = openlit.evals.Bias(provider="openai")
result = bias_detector.measure(
prompt="Describe a software engineer",
text="Software engineers are typically young men who work long hours"
)
Toxicity detection
import openlit
openlit.init()
# Initialize toxicity detector
toxicity_detector = openlit.evals.Toxicity(provider="openai")
result = toxicity_detector.measure(
prompt="Please provide feedback",
text="Your response contains concerning language patterns"
)