Best practices for escalation chains
Escalation chains define how IRM notifies your team when alerts arrive. Well-designed chains ensure timely response while preventing notification fatigue.
Understand escalation chains
Before building escalation chains, understand their role in the alert flow and how IRM executes them.
What is an escalation chain
An escalation chain is a sequence of steps that IRM executes when an alert group is created. Each step can notify users, wait for a response, or perform actions like declaring an incident.
Escalation chains connect routes to responders:
Alert → Route → Escalation Chain → Schedule → ResponderHow chains fit into the alert flow
Routes determine which escalation chain handles an alert group. The chain then executes its steps in order until someone acknowledges or resolves the alert.
For more information about routing, refer to the Alert routing best practices.
Build basic chains
Start with simple escalation patterns before adding complexity.
The following example shows the basic pattern that most escalation chains follow:
1. Notify on-call from schedule
2. Wait 5 minutes
3. Notify on-call from schedule (important)
4. Wait 10 minutes
5. Notify backup scheduleThis pattern:
- Starts with a standard notification.
- Waits for acknowledgment.
- Escalates to important notification if no response.
- Eventually reaches a backup.
Wait steps
Wait steps space out notifications to prevent fatigue and give responders time to act.
Tip
Start with longer waits and shorten based on actual response times.
Terminal steps
Every chain should end definitively. Without a terminal step, escalation can continue indefinitely.
Options for ending a chain:
- Resolve: Automatically resolve if no response is needed.
- Notify all: Escalate to an entire channel as a last resort.
- Repeat: Restart the chain a limited number of times.
Notification step types
Choose the right notification type for each stage of your escalation. Some common notification steps include:
Schedule-based notifications
Use Notify users from on-call schedule when:
- You have a defined on-call rotation.
- You want automatic rotation without updating chains.
- You need follow-the-sun coverage.
IRM evaluates the schedule when the step executes, not when the alert was created. This means schedule changes take effect immediately for pending escalation steps.
User queue notifications
Use Notify users from queue when:
- You want round-robin distribution across a fixed set of users.
- Multiple users should share the alert load.
- Your team doesn’t have a formal on-call rotation.
Round-robin behavior: Each escalation notifies the next user in the queue. IRM tracks the position per alert group, cycling through all users.
For all available escalation steps, refer to Configure escalation chains.
Default vs. Important notifications
For each notification step, you also need to specify whether to use Default or Important notification.
This refers to the two sets of notification rules that are configured in each user’s IRM profile.
To learn more about default and important notifications, refer to Types of notification rules.
When to use important notifications:
- After initial notification attempts fail to get a response.
- For truly critical alerts that need immediate attention.
Pattern:
1. Notify on-call (default)
2. Wait 5 minutes
3. Notify on-call (important) ← Escalate to importantCaution
Overusing important notifications reduces their effectiveness. Reserve them for genuine escalation within a chain.
Advanced patterns
Use these patterns for more sophisticated escalation logic.
Time-based routing
Use Continue if current UTC time is in range to route differently by time of day:
1. Check if 9am-6pm UTC
→ Yes: Notify business hours team
→ No: Continue to next step
2. Notify after-hours teamThis enables:
- Business hours versus after-hours escalation.
- Weekend-specific routing.
- Holiday coverage.
Alert volume throttling
Use Notify if number of alerts in time window to throttle low-priority escalations:
1. Check if >5 alerts in 30 minutes
→ Yes: Continue escalation
→ No: Pause escalationThis prevents paging for sporadic low-priority alerts while still escalating patterns that indicate a real problem.
Repeat escalation
Use Repeat escalation N times to restart the chain if no one responds:
1. Notify primary on-call
2. Wait 5 minutes
3. Notify secondary on-call
4. Wait 10 minutes
5. Repeat escalation (max 3 times)Note
Maximum 5 repeats to prevent infinite loops.
Declare incident
Use Declare incident to automatically create an incident from an alert group:
1. Notify on-call
2. Wait 5 minutes
3. Declare incident (severity: major)
4. Notify incident commander scheduleNote
Incident declaration only works on non-default routes. Configure specific routes for alerts that should trigger automatic incidents.
Organize your escalation chains
Good organization makes chains easier to maintain and debug during incidents.
One chain per escalation path
Create separate chains for different escalation needs:
payments-critical: Fast escalation for payment issues.payments-warning: Slower escalation for warnings.platform-business-hours: Business hours only.platform-24x7: Round-the-clock coverage.
Naming conventions
Use clear, descriptive names that help responders understand the chain’s purpose.
Include in the name:
- Include the team or service name.
- Include the severity or priority level.
- Include time-based behavior if applicable.
For example:
auth-team-critical-24x7data-pipeline-business-hoursinfrastructure-p1-immediate
Reuse chains across routes
Chains can be used by multiple routes. Design reusable chains for common patterns:
- Create generic severity-based chains that multiple teams can use.
- Create team-specific chains shared across services.
- Create standard escalation patterns for common scenarios.
Snapshot behavior
IRM snapshots escalation chains when it creates an alert group. This is important to understand before building chains.
What gets snapshotted:
- Chain configuration and steps.
- User queue positions.
- Schedule references (but not schedule contents).
What this means:
- Changes to a chain don’t affect alert groups already using it.
- To test chain changes, you need to create new alerts.
- Active alert groups continue using the original chain configuration.
Schedules are different: While the chain is snapshotted, schedules are evaluated dynamically. When a step runs, IRM checks who is currently on-call at that moment.
Testing and tuning
Test chains thoroughly before deploying to production, and tune based on real-world performance.
Testing with non-production alerts
Before deploying chain changes:
- Create a test integration.
- Send test alerts through the chain.
- Verify notifications reach the right people at the right times.
Remember that changes don’t affect existing alert groups due to snapshot behavior. Always test with new alerts.
Metrics to monitor
Track these metrics to understand chain effectiveness:
- Time to first acknowledgment: How quickly do responders engage?
- Escalation depth: How many steps run before someone responds?
- False escalations: How often do alerts escalate that didn’t need human response?
Tuning based on response patterns
Use metrics to improve your chains:
- High escalation depth: Shorten wait times or add more notification channels.
- Frequent false escalations: Review alert quality or add throttling.
- Slow acknowledgment: Consider adding important notification steps earlier.
Best practices summary
- Start simple: Begin with notify → wait → escalate patterns.
- End definitively: Always include a terminal step.
- Space notifications: Use wait steps to prevent fatigue.
- Use important sparingly: Reserve for genuine escalation.
- Remember snapshots: Changes don’t affect active alert groups.
- Name clearly: Descriptive names help during incidents.
- Test with new alerts: Snapshot behavior means existing alerts use old chains.
- Monitor and tune: Adjust based on actual response patterns.
Next steps
- Configure escalation chains in detail
- On-call schedules best practices
- Alert routing best practices



