AI and machine learning

Grafana AI Observability

Reference

Explore evaluation API

Grafana Cloud

Evaluation API

The evaluation control plane API manages evaluators, rules, and judge provider discovery.

Note
AI Observability is internally referred to as “Sigil” in some configuration and naming. For example, environment variables use the SIGIL_EVAL_ prefix, while the API routes documented here use the generic /api/v1/ prefix.

Evaluator endpoints

Method	Path	Description
`GET`	`/api/v1/evaluators`	List all evaluators.
`POST`	`/api/v1/evaluators`	Create an evaluator.
`GET`	`/api/v1/evaluators/{id}`	Get an evaluator by ID.
`PUT`	`/api/v1/evaluators/{id}`	Update an evaluator.
`DELETE`	`/api/v1/evaluators/{id}`	Delete an evaluator.

Rule endpoints

Method	Path	Description
`GET`	`/api/v1/eval-rules`	List all evaluation rules.
`POST`	`/api/v1/eval-rules`	Create a rule.
`GET`	`/api/v1/eval-rules/{id}`	Get a rule by ID.
`PUT`	`/api/v1/eval-rules/{id}`	Update a rule.
`DELETE`	`/api/v1/eval-rules/{id}`	Delete a rule.

Judge provider endpoints

Method	Path	Description
`GET`	`/api/v1/judge-providers`	List discovered judge providers.

Guard endpoints

Method	Path	Description
`POST`	`/api/v1/hooks:evaluate`	Evaluate a request against enabled guards.
`GET`	`/api/v1/eval/hook-rules`	List all guard rules.
`POST`	`/api/v1/eval/hook-rules`	Create a guard rule.
`GET`	`/api/v1/eval/hook-rules/{id}`	Get a guard rule by ID.
`PUT`	`/api/v1/eval/hook-rules/{id}`	Update a guard rule.
`DELETE`	`/api/v1/eval/hook-rules/{id}`	Delete a guard rule.

Hook evaluation request

{
  "phase": "preflight",
  "context": {
    "agent_name": "my-agent",
    "model": { "provider": "openai", "name": "gpt-4o" },
    "tags": { "env": "production" }
  },
  "input": {
    "messages": [
      { "role": "user", "parts": [{ "kind": "text", "text": "Hello" }] }
    ],
    "system_prompt": "You are a helpful assistant.",
    "tools": []
  }
}

Hook evaluation response

{
  "action": "allow",
  "transformed_input": null,
  "evaluations": []
}

The action field is allow or deny. When any guard rule applied transforms, transformed_input contains the sanitized copy of the input. Each entry in evaluations reports a per-rule outcome with rule_id, evaluator_id, passed, and latency_ms.

Guard rule configuration

{
  "rule_id": "block_tools",
  "enabled": true,
  "phase": "preflight",
  "priority": 5,
  "selector": "all",
  "match": {},
  "evaluator_ids": [],
  "action_on_fail": "deny",
  "tool_filter": { "blocked_names": ["shell_exec", "danger_*"] },
  "transform": {
    "patterns": [
      { "id": "ssn", "regex": "\\b\\d{3}-\\d{2}-\\d{4}\\b", "replacement": "[REDACTED:ssn]" }
    ]
  }
}

Each guard rule must include at least one of: evaluator_ids, transform.patterns, or tool_filter.blocked_names.

Score ingest

Transport	Endpoint
HTTP	`POST /api/v1/scores:export`

The score ingest endpoint accepts externally computed evaluation scores. Scores are idempotent — re-submitting the same score ID is a no-op.

Evaluator types

LLM judge

{
  "kind": "llm_judge",
  "config": {
    "provider": "openai",
    "model": "gpt-4o-mini",
    "system_prompt": "You are a quality evaluator.",
    "user_prompt": "Rate this response:\n{{assistant_response}}",
    "max_tokens": 100,
    "temperature": 0.0,
    "timeout_ms": 30000
  }
}

JSON schema

{
  "kind": "json_schema",
  "config": {
    "schema": {
      "type": "object",
      "required": ["answer"],
      "properties": {
        "answer": { "type": "string" }
      }
    },
    "target": "response"
  }
}

The optional target field sets the text to evaluate: response (default), input, or system_prompt.

Regex

{
  "kind": "regex",
  "config": {
    "pattern": "^\\d+$",
    "reject": false,
    "target": "response"
  }
}

Heuristic

{
  "kind": "heuristic",
  "config": {
    "version": "v2",
    "target": "response",
    "rules": {
      "and": [
        { "not_empty": "assistant_response" },
        { "min_length": { "field": "assistant_response", "value": 10 } }
      ]
    }
  }
}

Rule selectors

Selector	Description
`user_visible_turn`	Assistant text responses without tool calls.
`all_assistant_generations`	Any assistant output.
`tool_call_steps`	Generations containing tool calls.
`errored_generations`	Generations with `call_error`.

Rule alert integration

Rules can include an alert_rule_uids field that stores the UIDs of Grafana alert rules created from the evaluation rule. You can create alerts directly from the rule editor in the plugin UI by setting a pass-rate threshold and contact point. Alerts are created as non-provisioned, so you can edit them in the Grafana Alerting UI afterward.

Was this page helpful?

Email docs@grafana.com

Help and support

Community

Evaluation API

Evaluator endpoints

Rule endpoints

Judge provider endpoints

Guard endpoints

Hook evaluation request

Hook evaluation response

Guard rule configuration

Score ingest

Evaluator types

LLM judge

JSON schema

Regex

Heuristic

Rule selectors

Rule alert integration

Was this page helpful?

Still have questions?

Get every update

Evaluation API

Evaluator endpoints

Rule endpoints

Judge provider endpoints

Guard endpoints

Hook evaluation request

Hook evaluation response

Guard rule configuration

Score ingest

Evaluator types

LLM judge

JSON schema

Regex

Heuristic

Rule selectors

Rule alert integration

Was this page helpful?

Related resources from Grafana Labs