Grafana Cloud

Evaluation API

The evaluation control plane API manages evaluators, rules, and judge provider discovery.

Note

AI Observability is internally referred to as “Sigil” in some configuration and naming. For example, environment variables use the SIGIL_EVAL_ prefix, while the API routes documented here use the generic /api/v1/ prefix.

Evaluator endpoints

MethodPathDescription
GET/api/v1/evaluatorsList all evaluators.
POST/api/v1/evaluatorsCreate an evaluator.
GET/api/v1/evaluators/{id}Get an evaluator by ID.
PUT/api/v1/evaluators/{id}Update an evaluator.
DELETE/api/v1/evaluators/{id}Delete an evaluator.

Rule endpoints

MethodPathDescription
GET/api/v1/eval-rulesList all evaluation rules.
POST/api/v1/eval-rulesCreate a rule.
GET/api/v1/eval-rules/{id}Get a rule by ID.
PUT/api/v1/eval-rules/{id}Update a rule.
DELETE/api/v1/eval-rules/{id}Delete a rule.

Judge provider endpoints

MethodPathDescription
GET/api/v1/judge-providersList discovered judge providers.

Guard endpoints

MethodPathDescription
POST/api/v1/hooks:evaluateEvaluate a request against enabled guards.
GET/api/v1/eval/hook-rulesList all guard rules.
POST/api/v1/eval/hook-rulesCreate a guard rule.
GET/api/v1/eval/hook-rules/{id}Get a guard rule by ID.
PUT/api/v1/eval/hook-rules/{id}Update a guard rule.
DELETE/api/v1/eval/hook-rules/{id}Delete a guard rule.

Hook evaluation request

JSON
{
  "phase": "preflight",
  "context": {
    "agent_name": "my-agent",
    "model": { "provider": "openai", "name": "gpt-4o" },
    "tags": { "env": "production" }
  },
  "input": {
    "messages": [
      { "role": "user", "parts": [{ "kind": "text", "text": "Hello" }] }
    ],
    "system_prompt": "You are a helpful assistant.",
    "tools": []
  }
}

Hook evaluation response

JSON
{
  "action": "allow",
  "transformed_input": null,
  "evaluations": []
}

The action field is allow or deny. When any guard rule applied transforms, transformed_input contains the sanitized copy of the input. Each entry in evaluations reports a per-rule outcome with rule_id, evaluator_id, passed, and latency_ms.

Guard rule configuration

JSON
{
  "rule_id": "block_tools",
  "enabled": true,
  "phase": "preflight",
  "priority": 5,
  "selector": "all",
  "match": {},
  "evaluator_ids": [],
  "action_on_fail": "deny",
  "tool_filter": { "blocked_names": ["shell_exec", "danger_*"] },
  "transform": {
    "patterns": [
      { "id": "ssn", "regex": "\\b\\d{3}-\\d{2}-\\d{4}\\b", "replacement": "[REDACTED:ssn]" }
    ]
  }
}

Each guard rule must include at least one of: evaluator_ids, transform.patterns, or tool_filter.blocked_names.

Score ingest

TransportEndpoint
HTTPPOST /api/v1/scores:export

The score ingest endpoint accepts externally computed evaluation scores. Scores are idempotent — re-submitting the same score ID is a no-op.

Evaluator types

LLM judge

JSON
{
  "kind": "llm_judge",
  "config": {
    "provider": "openai",
    "model": "gpt-4o-mini",
    "system_prompt": "You are a quality evaluator.",
    "user_prompt": "Rate this response:\n{{assistant_response}}",
    "max_tokens": 100,
    "temperature": 0.0,
    "timeout_ms": 30000
  }
}

JSON schema

JSON
{
  "kind": "json_schema",
  "config": {
    "schema": {
      "type": "object",
      "required": ["answer"],
      "properties": {
        "answer": { "type": "string" }
      }
    },
    "target": "response"
  }
}

The optional target field sets the text to evaluate: response (default), input, or system_prompt.

Regex

JSON
{
  "kind": "regex",
  "config": {
    "pattern": "^\\d+$",
    "reject": false,
    "target": "response"
  }
}

Heuristic

JSON
{
  "kind": "heuristic",
  "config": {
    "version": "v2",
    "target": "response",
    "rules": {
      "and": [
        { "not_empty": "assistant_response" },
        { "min_length": { "field": "assistant_response", "value": 10 } }
      ]
    }
  }
}

Rule selectors

SelectorDescription
user_visible_turnAssistant text responses without tool calls.
all_assistant_generationsAny assistant output.
tool_call_stepsGenerations containing tool calls.
errored_generationsGenerations with call_error.

Rule alert integration

Rules can include an alert_rule_uids field that stores the UIDs of Grafana alert rules created from the evaluation rule. You can create alerts directly from the rule editor in the plugin UI by setting a pass-rate threshold and contact point. Alerts are created as non-provisioned, so you can edit them in the Grafana Alerting UI afterward.