Best practices for on-call schedules
On-call schedules determine who receives pages when escalation chains execute. Well-designed schedules ensure reliable coverage while respecting team members’ time.
Understand on-call schedules
Before building schedules, understand their role in IRM and how they’re evaluated.
What is a schedule
A schedule defines who is on-call at any given time. Schedules contain shifts that assign users or user groups to specific time periods.
Schedules connect escalation chains to responders:
Alert → Route → Escalation Chain → Schedule → On-call ResponderHow schedules connect to escalation chains
Escalation chains reference schedules through notification steps. When a step like “Notify users from on-call schedule” executes, IRM checks who is currently on-call.
Example chain using multiple schedules:
1. Notify on-call from "Primary Schedule"
2. Wait 5 minutes
3. Notify on-call from "Secondary Schedule"
4. Wait 10 minutes
5. Notify on-call from "Management Schedule"This pattern enables:
- Primary and secondary on-call.
- Follow-the-sun coverage.
- Escalation to management after hours.
Dynamic evaluation
IRM evaluates schedules at execution time, not when the alert was created.
What this means:
- Schedule changes take effect immediately for pending escalation steps.
- If someone swaps shifts while an alert is escalating, the new on-call person receives the next notification.
- You don’t need to worry about in-flight alerts using outdated schedule data.
This is different from escalation chains, which are snapshotted when an alert group is created.
Choose a schedule type
IRM supports three ways to manage schedules. Choose based on your team’s workflow.
Web schedules
The built-in schedule editor in the IRM UI.
Best for:
- Teams new to on-call management.
- Simple rotation patterns.
- Teams who prefer visual schedule management.
Features:
- Drag-and-drop shift editing.
- Visual rotation preview.
- Override management in UI.
iCal schedules
Import schedules from external calendar systems.
Best for:
- Teams with existing calendar-based schedules.
- Migration from PagerDuty, Opsgenie, or other tools.
- Organizations using shared calendar systems.
Considerations:
- Schedule changes require updating the external calendar.
- IRM periodically syncs from the iCal URL.
- Limited editing capabilities within IRM.
API/Terraform schedules
Manage schedules through the API or Infrastructure as Code.
Best for:
- Teams using Terraform or other IaC tools.
- Automated schedule management.
- Version-controlled schedule configurations.
Features:
- Full API control over shifts and rotations.
- Can enable web-based overrides while managing primary schedule via API.
- Integrates with existing deployment pipelines.
Comparison
Design on-call rotations
Design rotations that provide reliable coverage while distributing work fairly.
Rotation patterns
Choose a pattern based on your team size and alert frequency:
Combine patterns with different shift lengths:
- 12-hour shifts with weekly rotation: Two rotations covering day and night.
- Business hours shifts (9:00-18:00): Aligned with standard work hours.
- Extended shifts (36 hours): Bi-daily rotation for longer coverage periods.
For visual examples, refer to On-call schedule examples.
Set rotation start explicitly
Always set the Rotation start (called rotation_start in the API/Terraform) explicitly.
Why this matters:
- Shift start: When the shift pattern begins each day/week.
- Rotation start: When the rotation through users begins.
These can differ when you want the rotation to align with a specific date, like the start of a sprint, while shifts cover different hours.
User group rotation
For teams with varying availability, use user groups instead of individual users:
- Create user groups (arrays of users).
- Each rotation moves to the next group.
- All users in a group are on-call simultaneously.
Use cases:
- Primary and secondary on-call: Two users on-call at once.
- Follow-the-sun: Groups in different time zones.
- Graduated response: Junior and senior pairing.
- Shadow coverage: Onboarding new teammates.
- Backup coverage: Extra support during critical periods.
Handle schedule changes
Schedules need to accommodate vacations, sick days, and unexpected changes.
Shift swaps vs overrides
Shift swaps exchange shifts between two users:
- Maintains rotation continuity.
- The swapped user returns to their normal position after.
- Easier to track and audit.
Overrides replace the scheduled user with someone else:
- Creates a one-time exception.
- Doesn’t affect the underlying rotation.
- Use for coverage when swapping isn’t possible.
Best practice: Prefer shift swaps for planned changes. Use overrides for last-minute coverage.
Override priority
When shifts overlap, priority determines which takes precedence.
- Higher Priority (called
priority_levelin the API/Terraform) wins. - Overrides typically use priority 99 (highest).
- Primary shifts use lower priorities (0-10).
Set priorities intentionally to ensure overrides work as expected.
Timezone considerations
Configure timezone settings to avoid confusion for distributed teams.
Enable timezone support (called use_tz in the API/Terraform) for web schedules:
- Shifts respect the schedule’s timezone.
- Daylight saving time is handled automatically.
- Schedules are clearer for distributed teams.
Without timezone support (legacy):
- Shifts are stored as UTC.
- Manual adjustment is needed for DST.
- This can cause confusion across time zones.
Test timezone changes carefully:
A rotation starting “Monday 9am” in US/Pacific might be Monday or Tuesday in UTC depending on DST. Changing timezones can unexpectedly shift the rotation day.
Ensure coverage quality
Monitor schedules to identify gaps and ensure fair distribution.
Gap and empty shift reports
Enable schedule quality reports (called enabled_reports in the API/Terraform) to detect issues:
- Gaps: Time periods with no on-call coverage.
- Empty shifts: Shifts with no users assigned.
Both indicate coverage problems that should be addressed before they cause missed alerts.
Quality metrics
IRM calculates schedule quality metrics:
- Coverage percentage: Time with on-call coverage versus total time.
- Balance score: How evenly work is distributed among team members.
- Overloaded users: Team members with significantly more on-call time.
Review these metrics regularly to ensure fair and complete coverage.
Fair distribution
On-call work should be distributed fairly across the team:
- Monitor balance scores to identify overloaded team members.
- Adjust rotations if some users consistently carry more load.
- Consider timezone distribution for follow-the-sun schedules.
- Account for holidays and time off when calculating fairness.
Best practices summary
- Understand dynamic evaluation: Schedules are checked at execution time, not alert creation.
- Choose the right type: Web for simplicity, iCal for integration, API for automation.
- Set rotation start explicitly: Don’t rely on default behavior.
- Use shift swaps: Prefer swaps over overrides for planned changes.
- Enable timezone support: For new schedules, use timezone-aware shifts.
- Monitor quality: Enable gap and empty shift reports.
- Distribute fairly: Review balance scores regularly.
- Test changes: Verify timezone and rotation changes before production.
Next steps
- Configure on-call schedules in detail
- Explore schedule examples for common rotation patterns
- Manage schedules as code with Terraform
- Escalation chains best practices



