Use built-in dashboards
AI Observability includes pre-built analytics dashboards that visualize agent activity, performance, cost, and quality. The dashboards use Prometheus metrics and AI Observability query APIs to surface actionable insights.
Access dashboards
Navigate to Analytics in the AI Observability plugin. The dashboards are organized into these areas:
- Activity: generation counts, conversation counts, and active agents over time.
- Performance: latency distributions, time to first token, and error rates.
- Tokens and cost: token usage by model and provider, cost breakdown, and cache efficiency.
- Tools: tool call frequency, tool execution duration, tool error rates, and usage percentage per tool.
- Quality: evaluation scores, score distributions, and quality trends.
Identify performance issues
Use the performance dashboard to spot problems:
- High latency: filter by agent or model to find slow generations. Drill into traces for specific conversations to identify bottlenecks.
- Error spikes: the error rate panel shows failures over time. Click through to conversations with errors to inspect the
call_errorpayload. - Slow time to first token: for streaming agents, the TTFT panel reveals which models or prompts have poor streaming performance.
Optimize costs
The tokens and cost dashboard helps you find optimization opportunities:
- Cost by model: compare cost across models and providers. Consider switching expensive calls to cheaper models where quality is acceptable.
- Cache efficiency: the cache read ratio shows how effectively prompt caching reduces token usage. Low cache rates may indicate prompts that change too frequently.
- Token usage trends: spot unexpected increases in token usage that may indicate prompt regression or unnecessary verbosity.
Track quality
The quality dashboard visualizes evaluation scores alongside operational metrics:
- Score trends: monitor if quality improves or degrades after agent version changes.
- Score distributions: identify if responses cluster around high or low scores.
- Correlation: compare quality scores with latency and cost to find the right balance.
Use Prometheus metrics directly
If you need custom dashboards, query the AI Observability OpenTelemetry metrics in Prometheus:
| Metric | Description |
|---|---|
gen_ai_client_operation_duration | LLM call duration histogram. |
gen_ai_client_token_usage | Token consumption histogram. |
gen_ai_client_time_to_first_token | Streaming TTFT histogram. |
gen_ai_client_tool_calls_per_operation | Tool calls per generation. |
sigil_build_info | Build version info with revision and branch labels. |
If you enable evaluation metrics push (SIGIL_EVAL_METRICS_PUSH_ENDPOINT), per-tenant evaluation metrics are also available in Prometheus for custom dashboards and alerting.
Set up alerts
Create Grafana alerts on AI Observability metrics to proactively catch issues:
- Alert on error rate exceeding a threshold.
- Alert on p95 latency exceeding SLO targets.
- Alert on cost per day exceeding budget.
- Alert on evaluation score drops below a quality threshold.
Configure alerts in Grafana using the standard alerting workflow with the Prometheus data source.
Next steps
Was this page helpful?
Related resources from Grafana Labs


