Set up guards

Guards are synchronous checks that run on the request path before an LLM call. Use them to block harmful prompts, redact sensitive data, or filter dangerous tool calls — all before the model sees the request.

Guards reuse the same evaluator types as online evaluation (LLM judge, regex, JSON schema, heuristic) but execute synchronously and return an allow/deny decision to the SDK.

Before you begin

AI Observability is deployed and receiving generation data.
At least one judge provider is configured if you plan to use LLM judge guards. Refer to Configure evaluation for provider setup.
You have the AI Observability Admin role.

How guards work

Your application SDK calls AI Observability before each LLM invocation.
AI Observability evaluates the request against enabled guard rules in priority order.
Each matching rule can transform the input (regex redaction), check tool calls against a block list, or run evaluators against the content.
The response tells the SDK whether to proceed (allow) or abort (deny), and optionally includes a sanitized copy of the input.

If the guard service is unreachable, the SDK proceeds by default (fail-open). You can change this to fail-closed in your SDK configuration.

Create a guard

Navigate to Evaluation > Guards in the AI Observability plugin.
Click New guard.
Enter a guard ID (for example, pii_redaction or block_dangerous_tools).
Choose a guard type:

Evaluator guard

Runs one or more evaluators on the request content and denies or warns based on the result.

Select Evaluator as the guard type.
Attach one or more evaluators (existing evaluators from the Evaluators tab).
Set Action on fail to deny (blocks the request) or warn (logs but allows).

Transform guard

Applies regex patterns to redact or replace content before the request reaches the model.

Select Transform as the guard type.
Add one or more patterns:
- Regex — the pattern to match (for example, \b\d{3}-\d{2}-\d{4}\b for US SSNs).
- Replacement — the substitution text (for example, [REDACTED:ssn]). Leave empty for a default [REDACTED] placeholder.
Transforms run before evaluators, so you can combine a transform with an evaluator guard on the same rule.

Tool filter guard

Blocks specific tool calls by name using glob patterns.

Select Tool filter as the guard type.
Add blocked tool name patterns (for example, shell_exec, file_delete, Bash(*rm*)).
Patterns use * and ? wildcards. Patterns containing ( also match against tool call arguments, enabling argument-level filtering.

Configure priority and matching

Priority — guards run in ascending priority order. Lower numbers run first.
Selector — determines which requests the guard applies to. Default is all.
Match filters — narrow the guard to specific agents, models, or tags (for example, agent_name: ["my-agent"] or model.provider: ["openai"]).

Configure your SDK

Enable hooks in your SDK configuration to start calling guards:

Setting	Default	Description
`hooks.enabled`	`false`	Enable guard evaluation.
`hooks.endpoint`	AI Observability URL	Override the hooks endpoint.
`hooks.timeoutMs`	`5000`	Client-side timeout in milliseconds.
`hooks.failOpen`	`true`	Proceed with the LLM call if the guard request fails.
`hooks.phases`	`["preflight"]`	Which phases to evaluate.

When transformed_input is present in the guard response, the SDK uses the sanitized messages for the LLM call.

Example: Block PII and moderate content

A common setup combines a transform guard for PII redaction with an evaluator guard for content moderation:

Guard 1 (priority 1): Transform guard with regex patterns for SSN, email, and phone number redaction.
Guard 2 (priority 10): Evaluator guard with an LLM judge that checks whether the (now redacted) input is appropriate.

Guard 1 runs first and sanitizes the input. Guard 2 then evaluates the sanitized content. If the content is inappropriate, the request is denied.

Monitor guard activity

Guard execution emits Prometheus metrics:

Request rate and latency — track how guards affect your request path latency.
Allow/deny rates — monitor how often guards block requests.
Per-rule evaluations — see which rules trigger most frequently.