AI and machine learning

Grafana Assistant

Guides

Run investigations

Grafana Cloud

Note
Grafana Assistant Investigations is currently in public preview. Grafana Labs offers limited support, and breaking changes might occur prior to the feature being made generally available.

Run investigations

Grafana Assistant helps you investigate incidents by answering quick questions about your telemetry or launching multi-agent workflows that explore metrics, logs, traces, and profiles in the background.

Investigations solutions in Grafana

An investigation can refer to multiple features in Grafana Cloud. This page covers Grafana Assistant Investigations. Here is how it differs from other investigation features:

Grafana Assistant Investigations: Prompt-driven, multi-agent analysis that queries metrics, logs, traces, profiles, and SQL across your Grafana Cloud data. Lives inside the Assistant and produces a structured report.
Grafana Sift Investigation: ML-powered, automatic analysis of Kubernetes infrastructure. Runs curated detectors over cluster signals without a prompt. Free in Grafana Cloud. Not part of Grafana Assistant.

Choose the right tool for the job:

Use Assistant Investigations for cross-signal, service-level analysis guided by your prompt.
Use Sift Investigation for Kubernetes-only issues where you want quick, automatic triage.
Use the Investigations app when you need a manual notebook to curate and share evidence.

Before you begin

Investigations entitlement: Enable Grafana Assistant Investigations in Grafana Cloud.
Investigation access: Assign the Assistant Investigation User role if your environment uses RBAC.
Incident context: Summarize the symptom, impact, and affected services before launching an investigation.

Permissions

Grafana Assistant runs investigations on your behalf. The Assistant can only access the data sources and resources that you have permission to view.

Access controls include:

User identity: The Assistant uses your identity to execute queries.
Data access: Agents can only query metrics, logs, traces, and profiles that you are authorized to access.
RBAC compliance: All investigation activities respect your organization’s existing role-based access control policies.
Investigation visibility: You can only see investigations you created plus any legacy investigations. System-created investigations (from IRM webhooks, alerts, or incidents) require the Assistant System Investigation Viewer role. See Manage access (RBAC) for details.

Build infrastructure context

To provide accurate answers, the Assistant needs to understand your specific environment. You can scan your Prometheus data sources to create “infrastructure memories” of your service names, namespaces, and dependencies.

Navigate to Grafana Assistant > Settings > Infrastructure memory.
Select the Prometheus data sources you want to scan.
Start the scan.

Once complete, the Assistant can map natural language queries like “How is the checkout service?” to the correct metrics and labels in your system.

Maintain memory quality

Review the generated memories to ensure service names match your team’s terminology. Re-run scans periodically or after major infrastructure changes to keep the context current. Infrastructure memories can also be re-run automatically every week.

Leverage dashboard context

Grafana Assistant automatically uses your dashboards to understand how your services are monitored. When you ask a question or launch an investigation, the Assistant scans your recent dashboards to find relevant panels, queries, and variables.

This helps the Assistant:

Identify key metrics: Discover the specific metric names and labels used in your dashboards.
Understand topology: Learn how services relate to each other based on dashboard links.
Find logs and traces: Use the queries in your logs and traces panels as a starting point.

Ask quick questions

Use the chat interface to get immediate answers about system health without writing queries manually. Context is key, mention specific resources to get the best results.

Ask about specific signals

Query metrics directly by mentioning the service or data source.

Show the error rate for @checkout-service over the last hour.

Filter logs

Search for specific patterns in your logs within a time range.

Find logs mentioning ’timeout’ in @loki-prod from 10:00 to 10:15.

Refine the answer

Iterate on the results to group or sort the data.

Group the results by pod_name.

Correlate multiple signals

The Assistant can help you verify hypotheses by checking different types of data across the same timeframe.

Establish a baseline

Start with a metric or dashboard panel to set the context.

Look at the CPU usage on this panel.

Pivot to other data

Ask the Assistant to find related logs or traces for the same timeframe.

Are there any error logs for the same service during that CPU spike?

Synthesize

Ask for a summary that connects the findings across signals.

Explain how the CPU spike relates to the error logs.

Start an investigation

For complex incidents, use Investigation mode. This launches specialized AI agents that run in the background to analyze multiple data sources, correlate findings, and generate a structured report.

When to launch an investigation

Use Investigation when issues span multiple services, require parallel analysis of metrics and logs, or when you need a structured audit trail for an incident.

You can also automate investigations using IRM webhooks. When configured, the Assistant automatically starts investigations when alert groups are created or incidents are updated.

Investigation lifecycle

Launch: Provide a detailed prompt that captures the incident summary, timeframe, and focus areas.
Agent execution: Investigation agents fan out across domains, including Prometheus metrics, Loki logs, Tempo traces, Pyroscope profiles, and SQL data sources, to gather evidence.
Updates: Chat messages summarize progress. You can steer agents with feedback or additional hints.
Report: Once agents finish, the investigation workspace compiles a structured summary, timeline, and recommended follow-ups.

Investigations track token usage separately from chat and respect the monthly tenant limits defined in Grafana Cloud.

Run an assistant investigation

Switch the chat mode to Investigation.
Provide a detailed problem statement.
High latency in the payment service. Investigate the @payment-cluster and check for database locks.
Monitor the agent progress in the investigation workspace.
Review the final report, which includes a timeline of events, key findings, and recommended next steps.

Working with investigation reports

The Summary section delivers quick status updates and recommended next steps so you can brief stakeholders fast.

Expand the Report to read the full findings along with the queries and evidence each agent collected. The report is written in Markdown and is fully editable. You can correct findings, add notes, or reformat sections directly in the UI.

The Timeline records agent tasks in order, which helps during post-incident reviews, while the Activity log provides the raw events and specific tasks executed by each agent, making it easy to reproduce a step.

Diagrams generated by the Assistant (like service maps or sequence diagrams) include zoom and pan controls for detailed inspection.

When you are ready to communicate, ask the Assistant to convert the findings into incident updates, backlog items, or dashboard follow-ups.

Next steps

Was this page helpful?

Email docs@grafana.com

Help and support

Community

Run investigations

Investigations solutions in Grafana

Before you begin

Permissions

Build infrastructure context

Maintain memory quality

Leverage dashboard context

Ask quick questions

Ask about specific signals

Filter logs

Refine the answer

Correlate multiple signals

Establish a baseline

Pivot to other data

Synthesize

Start an investigation

When to launch an investigation

Investigation lifecycle

Run an assistant investigation

Working with investigation reports

Next steps

Was this page helpful?

Related resources from Grafana Labs