AI and machine learning

Grafana Assistant

Guides

Create Playbooks

Grafana Cloud

Create Playbooks

Playbooks are documents that guide Grafana Assistant agents with instructions, context, and specialized knowledge. By defining playbooks, you can encode how your team troubleshoots services, handles specific alerts, or manages shared infrastructure.

Unlike strict rules, playbooks are flexible guides that agents discover and use when relevant to a user’s request.

What you’ll achieve

Create playbooks to standardize troubleshooting workflows.
Attach dashboards, queries, and other context to agent instructions.
Enable slash commands to run complex playbooks on demand.
Configure playbooks to trigger MCP actions in external systems.
Control visibility to share playbooks with your team or keep them private.

Before you begin

Grafana Cloud: You must have a Grafana Cloud account.
Grafana Assistant: Ensure the Assistant is enabled for your stack.

Create a playbook

Create playbooks to capture your team’s expertise and standardize troubleshooting workflows.

Create playbooks to capture your team’s expertise. For example, create a “Service Troubleshooting” playbook that lists critical services and the canonical steps to diagnose issues.

Open Grafana Assistant.
Click the menu icon (three dots) and select Playbooks.
Click New Playbook.
Enter a title and description.
- Use headings to structure instructions.
- Write in natural language (e.g., “First check error rates, then look at logs”).
- Include specific metric names, alert names, and error messages for better context.
(Optional) Toggle Visible to agents to allow agents to discover this playbook via semantic search during conversations.
Click Save.

Use templates

Use quick templates to start with a structured outline:

Incident response workflows: Step-by-step procedures for handling incidents
Operational runbooks: Standard operating procedures and maintenance tasks
Architecture documentation: System design and component relationships

Select a template when creating a new playbook to save time on structure.

Add context

Attach specific resources to your playbook to give agents direct access to the right data.

Type @ in the playbook editor to open the resource picker.

Dashboards: Reference dashboards so agents know where to look.
Queries: Include PromQL or LogQL queries for the agent to run.
Labels: Specify relevant labels for filtering.

Check the @checkout-service dashboard and look for spikes in error rates.

To remove a context item, click the remove icon next to it in the editor.

Manage visibility

Control who sees and uses your playbooks to maintain privacy or enable team collaboration.

By default, new playbooks are visible to everyone in your team.

Private (Just me): Visible only to you. Valid for experimenting or personal workflows.
Team (Everybody): Visible to everyone on your team. Use this for shared processes and standard operating procedures.

To change visibility, use the toggle in the playbook settings.

Note
Only the original creator of a playbook can change its visibility scope.

You can view who created the playbook and who last edited it, along with the last updated timestamp, in the playbook sidebar.

Agent search and discovery

The Assistant uses semantic search to automatically find and reference relevant playbooks during conversations.

When you enable Visible to agents, the Assistant automatically searches and references your playbooks when answering questions—even if you don’t mention them explicitly.

When playbooks are searched

The Assistant automatically searches playbooks when your question requires:

Domain-specific knowledge about your systems or processes
System architecture or design information
Troubleshooting guidance
Procedural information

Optimize for search

Structure playbooks to help agents find the right information quickly.

Use descriptive titles: Write clear, specific titles like “Investigate Database Connection Failures” instead of “DB Stuff”.

Include keywords: Mention specific service names, metric names (e.g., CoreDNSErrorsHigh), error codes, and process names.

Use clear structure: Organize content with headings and step-by-step instructions.

Organize by alert: Structure playbooks with sections per alert name. This helps agents find exact troubleshooting steps for specific alerts.

Example structure:

# Infrastructure Troubleshooting Playbook

## CoreDNSErrorsHigh

When this alert fires, check:

- DNS query latency in the CoreDNS dashboard
- Pod restart count in the last 15 minutes
- Network policy changes that might block DNS traffic

## DatabaseConnectionPoolExhausted

When this alert fires, check:

- Active connection count vs pool size
- Long-running queries blocking connections
- Application connection leak patterns

Each section is indexed separately, so agents can retrieve only the relevant alert response steps.

Use slash commands

Turn any playbook into a slash command for quick, on-demand execution.

Slash commands are useful for recurring tasks like health checks or specific troubleshooting flows.

In the playbook editor, toggle Enable slash command.
Assign a command name (e.g., /check-cart).
- Must be under 25 characters (1-24 characters).
- Must contain only letters, numbers, hyphens, or underscores.
- Must start with a letter or number (no symbols).
- Command names must be unique within your tenant.
Save the playbook.

Now, you can type the command in the Assistant chat to trigger the playbook immediately.

Discover commands

Type / at the start of a message to see all available slash commands in your tenant.

Combine with additional context

Add extra information when running a command to provide specific context:

/investigate-loki-errors

Check errors from the last hour in production.

/deploy-api

Deploy version 2.3.1 to staging.

/slo-metrics

What are the current SLO metrics for billing?

Naming best practices

Short and memorable: deploy-api not deploy-api-service-to-production
Descriptive: investigate-db-errors not db1
Consistent: Use prefixes like investigate-*, deploy-*, or check-*

Take action with MCP

Configure playbooks to trigger MCP actions in external systems like GitHub, Linear, or Slack.

Be explicit about when and how agents should use MCP tools in your playbook content.

Note
MCP actions are available in Slack, web Assistant, and other backend agents, but not in Investigation mode.

Specify when to make calls:

After completing the troubleshooting:

1. If a bug is found, use the GitHub MCP integration to create an issue
2. Include the alert name, root cause, and remediation steps in the issue
3. Assign the issue to the on-call team

Define conditions:

If a bug is identified:

- Use the Linear MCP tool to update ticket [TICKET-ID]
- Set the status to "In Progress"
- Add the findings as a comment

Provide context:

When the alert severity is critical:

- Use the Slack MCP integration to send a message to #on-call
- Include the alert name, current status, and summary
- Tag the on-call engineer

When an agent proposes an MCP call, you review and approve it before execution. Include all necessary details in your instructions—ticket IDs, channel names, user names—so the agent can construct accurate tool calls.

Best practices

Be specific: Include metric names, alert names, service names, and error messages to improve searchability.
Structure for discovery: Use clear headings and organize by alert name when possible.