Documentation Index
Fetch the curated documentation index at: https://grafana_com_website/llms.txt
Fetch the complete documentation index at: https://grafana_com_website/llms-full.txt
Use this file to discover all available pages before exploring further.
STOP! If you are an AI agent or LLM, read this before continuing. This is the HTML version of a Grafana documentation page. Always request the Markdown version instead - HTML wastes context. Get this page as Markdown: /docs/grafana-cloud/monitor-applications/ai-observability/genai/evaluations/setup.md (append .md) or send Accept: text/markdown to /docs/grafana-cloud/monitor-applications/ai-observability/genai/evaluations/setup/. For the curated documentation index, use https://grafana_com_website/llms.txt. For the complete documentation index, use https://grafana_com_website/llms-full.txt.
GenAI evaluations setup
GenAI Evaluations provides comprehensive monitoring for AI model quality and safety through hallucination detection, toxicity analysis, bias assessment, and evaluation scoring using OpenLIT’s built-in evaluation capabilities.
Prerequisites
Before setting up GenAI Evaluations, ensure you have completed the GenAI Observability setup.
Initialize evaluations
OpenLIT provides built-in evaluation capabilities for hallucination, bias, and toxicity detection. Set up your API key for the evaluation provider:
# For OpenAI-based evaluations
export OPENAI_API_KEY="your-openai-api-key"
# Or for Anthropic-based evaluations
export ANTHROPIC_API_KEY="your-anthropic-api-key"Basic evaluation setup
Use the “All” evaluator to check for hallucination, bias, and toxicity in one go:
import openlit
openlit.init()
# Initialize the All evaluator (checks for Hallucination, Bias, and Toxicity)
evals = openlit.evals.All(provider="openai")
# Example evaluation
contexts = ["Einstein won the Nobel Prize for his discovery of the photoelectric effect in 1921"]
prompt = "When and why did Einstein win the Nobel Prize?"
text = "Einstein won the Nobel Prize in 1969 for his discovery of the photoelectric effect"
result = evals.measure(prompt=prompt, contexts=contexts, text=text)
print(result)Specific evaluation metrics
For targeted evaluations, use specific evaluation metrics:
Hallucination detection
import openlit
openlit.init()
# Initialize hallucination detector
hallucination_detector = openlit.evals.Hallucination(provider="openai")
result = hallucination_detector.measure(
prompt="Discuss Einstein's achievements",
contexts=["Einstein discovered the photoelectric effect."],
text="Einstein won the Nobel Prize in 1969 for the theory of relativity."
)Bias detection
import openlit
openlit.init()
# Initialize bias detector
bias_detector = openlit.evals.Bias(provider="openai")
result = bias_detector.measure(
prompt="Describe a software engineer",
text="Software engineers are typically young men who work long hours"
)Toxicity detection
import openlit
openlit.init()
# Initialize toxicity detector
toxicity_detector = openlit.evals.Toxicity(provider="openai")
result = toxicity_detector.measure(
prompt="Please provide feedback",
text="Your response contains concerning language patterns"
)Was this page helpful?
Related resources from Grafana Labs


