Grafana Cloud

Configure escalation chains

Escalation chains define the sequence of actions taken when an alert is triggered in Grafana IRM. They automate your incident response workflow by executing ordered steps until an alert is acknowledged, resolved, or all steps complete.

An effective escalation chain:

  • Ensures alerts reach the right people at the right time
  • Implements tiered response procedures based on severity
  • Automates notification and escalation processes
  • Prevents alerts from being missed

Create and manage escalation chains

Create a new escalation chain

  1. Navigate to IRM > Escalation Chains in the Grafana Cloud menu
  2. Click New escalation chain
  3. Enter a unique name and optional team assignment
  4. Click Add escalation step to add steps to your chain
  5. Configure steps and arrange them using drag-and-drop
  6. Click Save

Edit or delete an escalation chain

  • To edit: Select a chain and click Edit, then make changes and save
  • To delete: Select a chain, click Delete, and confirm

Note

Before deleting, check the Linked integrations and routes panel. Changes to the chain affect all associated integrations and routes.

Manage escalation chains with Terraform

You can use the Grafana Terraform provider to define and manage your escalation chains as code. This enables version control and easier reuse of your on-call workflows.

Notify a specific user, user group, or schedule

To notify individual responders or teams (such as a person, user group, or on-call schedule), use the corresponding step type in the grafana_oncall_escalation resource. Then define the object to notify using the associated *_to_notify parameters:

  • notify_persons + persons_to_notify: Notifies specific users
  • notify_persons_next_each_time + persons_to_notify_next_each_time: Round robin notification
  • notify_on_call_from_schedule + notify_on_call_from_schedule: Notifies the currently on-call person from a schedule
  • notify_user_group + group_to_notify: Notifies a defined Slack user group

Notify a channel

To post alert groups in a Slack, MS Teams, or Telegram channel, you must define the channel in the grafana_oncall_route resource. Then, in the grafana_oncall_escalation resource, use the notify_whole_channel step to send a message to that channel when the escalation chain is triggered. Please note that this step is only available for Slack channels and that it will trigger a notification for all users in the channel, so it should be used with care.

To notify an entire Slack channel:

  1. Use the slack_channel_id in your grafana_oncall_route resource.
  2. Use the notify_whole_channel = true step type in your grafana_oncall_escalation resource.

Note

The route defines where the alert group is delivered, and the escalation controls when and whether the whole channel is notified. You can’t post an alert group to one channel and notify a different one.

For more detailed set up instructions, refer to Manage Grafana IRM in Grafana Cloud using Terraform.

Types of escalation steps

Notification steps

  • Notify users: Send notifications to specific users or groups
  • Notify from on-call schedule: Alert currently on-call users
  • Notify all team members: Alert everyone in a specified team
  • Notify Slack channel/user group: Send notifications to Slack users
  • Round robin notifications: Rotate through a list of users sequentially

Timing and control steps

  • Wait: Pause for a specified duration before proceeding to the next escalation step
  • Repeat escalation: Loop the escalation steps up to five times
  • Time-based escalation: Continue only during specified time periods
  • Threshold-based escalation: Continue escalation only if more than X alerts occur within Y minutes (requires num_alerts_in_window and num_minutes_in_window parameters)

Action steps

  • Resolve incident automatically: Mark the alert group as “Resolved automatically” without user intervention
  • Trigger outgoing webhook: Send data to an external system using a configured outgoing webhook
  • Declare incident: Create a new incident with specified severity. Limited to one incident per route at a time; additional alerts are grouped into the active incident

Notification types

When configuring escalation steps, you can specify which set of notification rules to use:

Each user can customize their notification rules in the IRM tab of their user.

Example escalation chains

Basic notification chain

  1. Notify primary on-call person (important)
  2. Wait 5 minutes
  3. Notify primary again (important)
  4. Wait 10 minutes
  5. Notify backup on-call person (important)

Critical system

  1. Notify primary on-call person (important)
  2. Notify Slack channel #critical-alerts
  3. Wait 5 minutes
  4. Notify backup on-call person (important)
  5. Wait 2 minutes
  6. Notify all team members (important)
  7. Wait 2 minutes
  8. Declare incident with severity “critical” if not acknowledged

Configure Business hours vs. after hours escalation

Use on-call schedules to route alerts to different responders based on the time of day.

Before creating this escalation chain, you must create two on-call schedules:

Schedule 1: Business Hours Team

  • Type: Simple schedule or calendar-based schedule
  • Coverage: Monday-Friday, 9 AM - 5 PM
  • Participants: Team members (rotating or all participants)

Schedule 2: After Hours Individual

  • Type: Simple schedule
  • Coverage: Monday-Friday, 5 PM - 9 AM (next day), plus all weekend
  • Participant: A specific individual

For more information about creating schedules, refer to Create on-call schedules.

Escalation chain configuration

Create an escalation chain with the following steps:

  1. Notify from on-call schedule: “Business Hours Team” (important)
  2. Wait 5 minutes (optional but recommended)
  3. Notify from on-call schedule: “After Hours Individual” (important)

How it works

When an escalation step attempts to notify from a schedule that has no one on-call, that step is automatically skipped and the escalation continues to the next step.

  • During business hours (Monday-Friday, 9 AM - 5 PM): Step 1 notifies the person who is on-call in the Business Hours Team schedule. If they don’t acknowledge, step 2 waits 5 minutes, then step 3 attempts to notify the After Hours Individual schedule. However, since no one is on-call in that schedule during business hours, step 3 is skipped.
  • After hours (evenings, nights, and weekends): Step 1 attempts to notify the Business Hours Team schedule, but since no one is on-call, that step is skipped. Step 2 waits 5 minutes, then step 3 notifies the person who is on-call in the After Hours Individual schedule.

This approach ensures that alerts are routed to the appropriate responder immediately, without waiting for business hours to begin.

Alternative: Use a single 24/7 schedule

Instead of creating separate schedules, you can create one schedule with different shifts that covers all hours:

  • Business hours shifts assigned to team members
  • After-hours and weekend shifts assigned to a specific individual

This approach is simpler to manage and scales better as your team grows.

Best practices

  • Start simple: Begin with basic notification steps before adding complexity
  • Test thoroughly: Verify chains with non-production alerts first
  • Document your chains: Maintain explanations of each chain’s purpose
  • Include wait steps: Add appropriate delays between notifications
  • Use important notifications sparingly: Reserve for truly critical alerts
  • Consider time zones: Create chains respecting global team distribution

Next steps