Add skills to Grafana Assistant for faster answers and investigations

Add skills to Grafana Assistant for faster answers and investigations

2026-02-057 min
Twitter
Facebook
LinkedIn

Note: As of March 3, 2026, playbooks in Grafana Assistant are now known as skills. We will add support for the full spec in a future release. This blog post has been updated to reflect the change.

Grafana Assistant is the most general-purpose tool we’ve delivered since dashboards. People use our Grafana Cloud LLM to understand unfamiliar areas of their stacks, generate dashboards and beautiful visualizations out of thin air, build queries, and support investigations.

This has been great for greenfield projects and onboarding new users, but what about organizations that have been running systems at scale for a while—the ones with a complex understanding of how best to operate and debug their unique systems? 

To help those teams and more incorporate AI in their existing observability practices, we're excited to introduce a new feature called skills (currently in public preview). By leveraging Assistant's ability to study your telemetry and use that knowledge to level-up your engineers, you can now essentially encode the ways you debug your systems directly into Grafana Cloud.

And you're not just automating tasks; you're operationalizing your organization's collective SRE expertise. This will make your agents more effective at providing insights and answers, and it will ensure future incident investigations run faster and more consistently.

In this blog, we'll briefly look at how skills work, dig into some real-world use cases, and review some best practices so you can get the most out of them.

How skills can help your team

Skills are simple documents—a title and a body. Together they are a useful guide or set of instructions for agents to follow. They can mention specific items, like dashboards, metrics, or even other skills.

Flowchart showing Grafana Assistant using a Vector DB Index and skills for searching, reading, and indexing information.

With skills, you can:

  • Write in natural language: Describe instructions in easy-to-understand language; maybe even copy from existing runbooks.
  • Attach context: Easily attach relevant Grafana dashboards, PromQL or LogQL queries, labels, and more directly to the skill steps. You can even reference other skills.
  • Create commands: Turn a complex investigation flow into a single, reusable command that you can run directly from chat, like /check-cart-timeouts.

And you decide the level of guidance, staying in control the whole time. For some tasks you may want to provide helpful tips. For others, perhaps you'll spell out step by step what the agents should do. You can even decide if the skill should be limited to your own workflows or shared with your entire team.

Use case 1: Encode how teams investigate their services

A flowchart for the Checkout team playbook on handling checkout issues, detailing services, triggers, and steps to diagnose and escalate problems.

To help illustrate the following use cases, we'll look at how a checkout team might put a skill to use.

A great starting pattern is for every team to own a primary skill that lists its critical services, related alerts, and the canonical investigation flow.  

Agents automatically apply this skill whenever they encounter checkout-related alerts or issues in a conversation, ensuring a consistent first response.

Use case 2: Alert-specific skills that provide better workflows

A text document lists playbooks to investigate common errors, focusing on error rates, checkout latency, and payment decline issues in a system.

Alert-specific skills are concise guides that push agents towards your organization’s preferred workflow, such as "metrics first, logs later." You can put multiple alerts in the same skill, using Markdown headings to separate the content.

Use case 3: Technology skills for shared infrastructure

Carousel with 2 slides. Use previous and next buttons to navigate.

Code snippet showing steps to investigate Kafka consumer lag, including commands to confirm lag, check brokers, and analyze request flow.
Flowchart titled "Example MySQL playbook." Outlines steps to check database bottleneck issues, including pressure, slow queries, pool health, and restart.

Shared infrastructure technologies like Kafka and MySQL can benefit from reusable skills that standardize investigations across many teams.

Use case 4: Command-driven skills for recurring investigations or queries

Text document explaining the "/check-cart" command for investigating cart issues, highlighting latency checks, cache patterns, and SQL metrics.

Turn your skills into chat commands to trigger complex, multi-step flows with a single message, making them fast and easy to execute. 

Blog image

By typing /check-cart, the Assistant follows this detailed skill and returns a structured, actionable answer.

Blog image

You run this as a single command and have a consistent, immediate report ready for leadership.

How to get started with skills

To get started, simply select Skills from the three-dot menu in Grafana Assistant. Here, you can create a new skill, read other skills, and use the "Try it now" section to test your prompts to make sure they cover what you need.

If you activate the "Visible to agents" toggle, the skill will be added to our new semantic search service, which lets agents quickly find content even if the language doesn’t perfectly align. Information in these agent-visible skills can be used by the AI agents whether the skill is directly mentioned by the user or not. This is what improves the experience for everybody as you add skills.

Tip: You can copy and paste pertinent sections from your existing runbooks, provided the instructions are meaningful to the Assistant.

Assistant skills and commands

You can also activate the "Enable slash command" toggle, which lets you run this skills on-demand from inside an Assistant chat. This is useful if you have prompts that you would like to reuse, like running a morning health-check with your coffee.

Since skills are just a file of natural language text, there are a large number of ways you can use them to influence agents.

Settings panel with options for visibility, agent reference, and a toggle to enable a slash command for "/healthcheck" in an assistant.

Handle visibility

In addition, skills can be just for you, which is perfect for helping with your particular day-to-day tasks and for running experiments as you add new skills. Or, they can be switched to apply to everybody on your team, which will roll out the skill to the whole Grafana Cloud stack.

Screenshot of a Assistant playbook page titled "Grafana Assistant: Runbooks," featuring a sidebar with menu options, text descriptions, and edit settings.

Screenshot of a comprehensive skill for Grafana Assistant

Skills vs. rules

At this point, you may be wondering how Assistant skills differ from Assistant rules, so let's briefly explain the distinction.

Rules always apply. Skills are discovered at runtime based on the user prompt and what the agents find during investigation.

  • Use rules for critical things that never change
  • Use skills to manage agent knowledge and guidelines
  • Skills can also be run on-demand as commands

Rules and skills can be intertwined: Rules could include a high-level overview (e.g., "Use skills to handle this or that"), while skills would provide the exhaustive information.

Like most things, we recommend you try some real examples with your actual data—it’s the best way to fine tune agents to do your bidding.

Skills best practices

Finally, to get the most out of skills, we have compiled a list of best practices and other tips to help you hit the ground running today.

Keep things simple and clear

Write skills in simple, unambiguous language. The instructions should be clear to the agents that are interpreting them. Use "@" to mention context where appropriate.

You should also keep skills small and focused. However, Markdown files are chunked into sections by headings, so it's acceptable to have one skill with many sections—like having a subheading for each alert.

Keep improving skills with real investigations

Skills are living documents, not static checklists. The most effective way to keep them current is to review and update them after major incidents or tricky investigations. After an incident, you can paste the incident timeline or chat transcript into your Assistant alongside the current skill and ask: “Which steps did we improvise that should become part of this skill?”

This approach keeps humans in charge of when to change the guidance, but lets you lean on the Assistant to spot gaps, confusing wording, or missing alerts and dashboards.

Experiment

Skills should be carefully authored and tried out before sharing more widely to ensure they effectively guide and accelerate workflows. Authors can do this by making the playbook visible only to themselves and their agents before rolling out to everybody when they're ready.

Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!

Tags

Related content