---
title: "Best practices for on-call schedules | Grafana Cloud documentation"
description: "Best practices for designing and managing on-call schedules in Grafana IRM."
---

> For a curated documentation index, see [llms.txt](/llms.txt). For the complete documentation index, see [llms-full.txt](/llms-full.txt).

# Best practices for on-call schedules

On-call schedules determine who receives pages when escalation chains execute. Well-designed schedules ensure reliable coverage while respecting team members’ time.

## Understand on-call schedules

Before building schedules, understand their role in IRM and how they’re evaluated.

### What is a schedule

A schedule defines who is on-call at any given time. Schedules contain shifts that assign users or user groups to specific time periods.

Schedules connect escalation chains to responders:

text ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```text
Alert → Route → Escalation Chain → Schedule → On-call Responder
```

### How schedules connect to escalation chains

Escalation chains reference schedules through notification steps. When a step like “Notify users from on-call schedule” executes, IRM checks who is currently on-call.

Example chain using multiple schedules:

text ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```text
1. Notify on-call from "Primary Schedule"
2. Wait 5 minutes
3. Notify on-call from "Secondary Schedule"
4. Wait 10 minutes
5. Notify on-call from "Management Schedule"
```

This pattern enables:

- Primary and secondary on-call.
- Follow-the-sun coverage.
- Escalation to management after hours.

### Dynamic evaluation

IRM evaluates schedules at **execution time**, not when the alert was created.

**What this means:**

- Schedule changes take effect immediately for pending escalation steps.
- If someone swaps shifts while an alert is escalating, the new on-call person receives the next notification.
- You don’t need to worry about in-flight alerts using outdated schedule data.

This is different from escalation chains, which are snapshotted when an alert group is created.

## Choose a schedule type

IRM supports three ways to manage schedules. Choose based on your team’s workflow.

### Web schedules

The built-in schedule editor in the IRM UI.

**Best for:**

- Teams new to on-call management.
- Simple rotation patterns.
- Teams who prefer visual schedule management.

**Features:**

- Drag-and-drop shift editing.
- Visual rotation preview.
- Override management in UI.

### iCal schedules

Import schedules from external calendar systems.

**Best for:**

- Teams with existing calendar-based schedules.
- Migration from PagerDuty, Opsgenie, or other tools.
- Organizations using shared calendar systems.

**Considerations:**

- Schedule changes require updating the external calendar.
- IRM periodically syncs from the iCal URL.
- Limited editing capabilities within IRM.

### API/Terraform schedules

Manage schedules through the API or Infrastructure as Code.

**Best for:**

- Teams using Terraform or other IaC tools.
- Automated schedule management.
- Version-controlled schedule configurations.

**Features:**

- Full API control over shifts and rotations.
- Can enable web-based overrides while managing primary schedule via API.
- Integrates with existing deployment pipelines.

### Comparison

Expand table

| Type   | Management        | Best for                         | Flexibility |
|--------|-------------------|----------------------------------|-------------|
| Web    | UI                | Simple rotations, visual editing | High        |
| iCal   | External calendar | Migration, existing calendars    | Low         |
| API/TF | Code              | Automation, version control      | High        |

## Design on-call rotations

Design rotations that provide reliable coverage while distributing work fairly.

### Rotation patterns

Choose a pattern based on your team size and alert frequency:

Expand table

| Pattern   | Duration | Use case                                 |
|-----------|----------|------------------------------------------|
| Daily     | 24 hours | High-frequency alerts, distributed teams |
| Weekly    | 7 days   | Most common, good work-life balance      |
| Bi-weekly | 14 days  | Smaller teams, less frequent alerts      |

Combine patterns with different shift lengths:

- **12-hour shifts with weekly rotation:** Two rotations covering day and night.
- **Business hours shifts (9:00-18:00):** Aligned with standard work hours.
- **Extended shifts (36 hours):** Bi-daily rotation for longer coverage periods.

For visual examples, refer to [On-call schedule examples](/docs/grafana-cloud/alerting-and-irm/irm/on-call-schedules/schedule-examples).

### Set rotation start explicitly

Always set the **Rotation start** (called `rotation_start` in the API/Terraform) explicitly.

**Why this matters:**

- **Shift start:** When the shift pattern begins each day/week.
- **Rotation start:** When the rotation through users begins.

These can differ when you want the rotation to align with a specific date, like the start of a sprint, while shifts cover different hours.

### User group rotation

For teams with varying availability, use user groups instead of individual users:

1. Create user groups (arrays of users).
2. Each rotation moves to the next group.
3. All users in a group are on-call simultaneously.

**Use cases:**

- **Primary and secondary on-call:** Two users on-call at once.
- **Follow-the-sun:** Groups in different time zones.
- **Graduated response:** Junior and senior pairing.
- **Shadow coverage:** Onboarding new teammates.
- **Backup coverage:** Extra support during critical periods.

## Handle schedule changes

Schedules need to accommodate vacations, sick days, and unexpected changes.

### Shift swaps vs overrides

**Shift swaps** exchange shifts between two users:

- Maintains rotation continuity.
- The swapped user returns to their normal position after.
- Easier to track and audit.

**Overrides** replace the scheduled user with someone else:

- Creates a one-time exception.
- Doesn’t affect the underlying rotation.
- Use for coverage when swapping isn’t possible.

**Best practice:** Prefer shift swaps for planned changes. Use overrides for last-minute coverage.

### Override priority

When shifts overlap, priority determines which takes precedence.

- Higher **Priority** (called `priority_level` in the API/Terraform) wins.
- Overrides typically use priority 99 (highest).
- Primary shifts use lower priorities (0-10).

Set priorities intentionally to ensure overrides work as expected.

### Timezone considerations

Configure timezone settings to avoid confusion for distributed teams.

**Enable timezone support** (called `use_tz` in the API/Terraform) for web schedules:

- Shifts respect the schedule’s timezone.
- Daylight saving time is handled automatically.
- Schedules are clearer for distributed teams.

**Without timezone support (legacy):**

- Shifts are stored as UTC.
- Manual adjustment is needed for DST.
- This can cause confusion across time zones.

**Test timezone changes carefully:**

A rotation starting “Monday 9am” in US/Pacific might be Monday or Tuesday in UTC depending on DST. Changing timezones can unexpectedly shift the rotation day.

## Ensure coverage quality

Monitor schedules to identify gaps and ensure fair distribution.

### Gap and empty shift reports

Enable schedule quality reports (called `enabled_reports` in the API/Terraform) to detect issues:

- **Gaps:** Time periods with no on-call coverage.
- **Empty shifts:** Shifts with no users assigned.

Both indicate coverage problems that should be addressed before they cause missed alerts.

### Quality metrics

IRM calculates schedule quality metrics:

- **Coverage percentage:** Time with on-call coverage versus total time.
- **Balance score:** How evenly work is distributed among team members.
- **Overloaded users:** Team members with significantly more on-call time.

Review these metrics regularly to ensure fair and complete coverage.

### Fair distribution

On-call work should be distributed fairly across the team:

- Monitor balance scores to identify overloaded team members.
- Adjust rotations if some users consistently carry more load.
- Consider timezone distribution for follow-the-sun schedules.
- Account for holidays and time off when calculating fairness.

## Best practices summary

- **Understand dynamic evaluation:** Schedules are checked at execution time, not alert creation.
- **Choose the right type:** Web for simplicity, iCal for integration, API for automation.
- **Set rotation start explicitly:** Don’t rely on default behavior.
- **Use shift swaps:** Prefer swaps over overrides for planned changes.
- **Enable timezone support:** For new schedules, use timezone-aware shifts.
- **Monitor quality:** Enable gap and empty shift reports.
- **Distribute fairly:** Review balance scores regularly.
- **Test changes:** Verify timezone and rotation changes before production.

## Next steps

- [Configure on-call schedules](/docs/grafana-cloud/alerting-and-irm/irm/on-call-schedules) in detail
- [Explore schedule examples](/docs/grafana-cloud/alerting-and-irm/irm/on-call-schedules/schedule-examples) for common rotation patterns
- [Manage schedules as code](/docs/grafana-cloud/alerting-and-irm/irm/on-call-schedules/schedules-as-code) with Terraform
- [Escalation chains](/docs/grafana-cloud/alerting-and-irm/irm/guides/best-practices/escalation-chains) best practices
