---
title: "Alerting and incident response | Grafana Labs"
description: "Detect problems and respond effectively"
---

> For a curated documentation index, see [llms.txt](/llms.txt). For the complete documentation index, see [llms-full.txt](/llms-full.txt).

## What is it?

A complete stack for **detecting problems and responding effectively**: unified alerting, SLOs, on-call management, incident coordination, and AI-assisted root cause analysis.

## When you need it

| Scenario                                 | What Alerting and IRM provides                |
|------------------------------------------|-----------------------------------------------|
| You want to know when things break       | Unified alerting across metrics, logs, traces |
| You need to define reliability targets   | SLOs with error budgets                       |
| You need to manage on-call rotations     | Schedules, escalations, integrations          |
| You need to coordinate incident response | War rooms, timelines, post-mortems            |

## Questions answered

| With Alerting and IRM, you can answer…                          |
|-----------------------------------------------------------------|
| How do I get notified when something breaks?                    |
| Are we meeting our reliability targets?                         |
| Who’s on-call right now and how do I reach them?                |
| What happened during this incident and what was the root cause? |

## Problems solved

| Problem                                    | Solution                                                                                        |
|--------------------------------------------|-------------------------------------------------------------------------------------------------|
| “We find out about outages from customers” | Proactive alerting detects issues first.                                                        |
| “Too many alerts, we ignore them”          | SLOs focus alerts on what matters to users.                                                     |
| “Unclear who to call during incidents”     | OnCall manages schedules and escalations.                                                       |
| “Root cause analysis takes hours”          | Sift automates Kubernetes checks; Grafana Assistant Investigations analyzes across all signals. |
