Grafana Cloud
Last reviewed: April 1, 2026

Infrastructure as code

Infrastructure as code (IaC) enables you to automate your Grafana IRM setup and configuration using version-controlled, repeatable processes. This approach helps maintain consistency and simplifies the management of your incident response infrastructure.

Using infrastructure as code with IRM provides several key benefits:

  • Consistency: Reduce manual errors and enforce standardized configurations.
  • Version control: Track changes, manage rollbacks, and collaborate effectively.
  • Scalability: Deploy and manage configurations across multiple teams and environments.
  • Automation: Simplify updates and minimize manual intervention.
  • Compliance: Maintain audit trails and enforce security policies through code.

Before you begin

Ensure you have the following:

  • Access to your organization’s Grafana Cloud account.
  • Appropriate permissions to create and manage IRM resources.
  • A Grafana Cloud service account token or OnCall API key with the necessary permissions.
  • Terraform installed on your system (if using Terraform).

Supported tools

IRM supports the following infrastructure as code approaches:

  • Terraform: Use the Grafana Terraform provider to manage on-call schedules, escalation chains, integrations, routes, and more.
  • OnCall API: Build custom workflows and programmatically control your incident response setup. For API reference, refer to OnCall API.

Understand IRM resource management limitations

Not all IRM resources support full IaC management. Understanding these boundaries helps you plan your automation strategy.

Fully manageable via API and Terraform:

  • Integrations (including direct paging)
  • Escalation chains and escalation policies
  • Routes
  • On-call schedules and shifts
  • Shift swaps
  • Outgoing webhooks
  • Personal notification rules
  • Resolution notes

Read-only via API (provisioned through Grafana):

  • Users: Synced from your Grafana instance. Listing users with a Terraform user-agent triggers a sync.
  • Teams: Synced from Grafana teams. You can reference teams by ID, but you can’t create or modify them through the OnCall API.
  • Organizations: Read-only.

Not available via the OnCall API:

  • Incident management resources: Managed through the separate Incident API.
  • ChatOps configurations (Slack, Microsoft Teams, Telegram): Managed through the IRM UI.
  • Admin organization settings: Managed through the IRM UI.

Note

IRM uses its own internal user IDs, which are different from Grafana user IDs. When you manage IRM resources with Terraform, you must map Grafana users to their corresponding IRM user IDs. For an example, refer to the Map user IDs section.

Set up the Terraform provider

To authenticate Terraform requests, you can use either a Grafana Cloud service account token (recommended) or a legacy OnCall API key.

Create an API token

  1. In your Grafana Cloud instance, go to Alerts & IRM > IRM.
  2. Go to Settings and select Admin & API.
  3. In the API Tokens section, click Create New Token.
  4. Provide a name and select appropriate permissions.
  5. Save the token securely. You can’t view it again after creation.

For more information about authentication methods, refer to the OnCall API authentication documentation.

Configure the provider

Add the following configuration to your Terraform files:

hcl
terraform {
  required_providers {
    grafana = {
      source  = "grafana/grafana"
      version = ">= 3.0.0"
    }
  }
}

provider "grafana" {
  alias               = "oncall"
  oncall_access_token  = var.oncall_access_token  # Store tokens in variables
}

Example configurations

Map user IDs

IRM uses internal user IDs that differ from Grafana user IDs. Use the grafana_oncall_user data source to look up IRM user IDs by username (typically the user’s email address):

hcl
// Import users from IRM
data "grafana_oncall_user" "all_users" {
  provider = grafana.oncall
  // Extract a flat set of all users from all teams
  for_each = toset(flatten([
    for team_name, username_list in local.teams : [
      username_list
    ]
  ]))
  username = each.key
}

// On-call groups / teams
locals {
  teams = {
    emea = [
      "alfa@grafana.com",
      "bravo@grafana.com",
      "charlie@grafana.com",
      "delta@grafana.com",
      "echo@grafana.com",
      "foxtrot@grafana.com",
      "golf@grafana.com",
    ]
  }
  // The OnCall API operates with resource IDs, so convert emails into IDs
  teams_map_of_user_id = { for team_name, username_list in local.teams : team_name => [
  for username in username_list : lookup(data.grafana_oncall_user.all_users, username).id] }
  // Reverse lookup: find a user by their OnCall ID
  users_map_by_id = { for username, oncall_user in data.grafana_oncall_user.all_users :
  oncall_user.id => oncall_user }
}

Define an on-call schedule

Create a web-based schedule with on-call shifts. Schedules and shifts are separate resources - define shifts first, then reference them in the schedule:

hcl
resource "grafana_oncall_on_call_shift" "week_shift" {
  provider   = grafana.oncall
  name       = "Weekly Rotation"
  type       = "rolling_users"
  start      = "2024-01-01T08:00:00"
  duration   = 60 * 60 * 24 * 7  # One week in seconds (604800)
  frequency  = "weekly"
  rolling_users = [
    [data.grafana_oncall_user.all_users["alfa@grafana.com"].id],
    [data.grafana_oncall_user.all_users["bravo@grafana.com"].id],
  ]
  time_zone  = "UTC"
}

resource "grafana_oncall_schedule" "primary" {
  provider  = grafana.oncall
  name      = "Primary On-Call Rotation"
  type      = "web"
  time_zone = "UTC"
  shifts    = [grafana_oncall_on_call_shift.week_shift.id]
}

For more schedule examples, refer to Schedules as code.

Create an escalation chain

Escalation chains and their steps are separate resources. Define the chain first, then add escalation steps:

hcl
resource "grafana_oncall_escalation_chain" "default" {
  provider = grafana.oncall
  name     = "Primary Escalation Chain"
}

resource "grafana_oncall_escalation" "step_notify" {
  provider             = grafana.oncall
  escalation_chain_id  = grafana_oncall_escalation_chain.default.id
  type                 = "notify_persons"
  persons_to_notify    = [data.grafana_oncall_user.all_users["alfa@grafana.com"].id]
  position             = 0
}

resource "grafana_oncall_escalation" "step_wait" {
  provider             = grafana.oncall
  escalation_chain_id  = grafana_oncall_escalation_chain.default.id
  type                 = "wait"
  duration             = 300  # Wait 5 minutes before next step
  position             = 1
}

resource "grafana_oncall_escalation" "step_notify_schedule" {
  provider                     = grafana.oncall
  escalation_chain_id          = grafana_oncall_escalation_chain.default.id
  type                         = "notify_on_call_from_schedule"
  notify_on_call_from_schedule = grafana_oncall_schedule.primary.id
  position                     = 2
}

Configure an integration with labels

Create an integration and assign static labels. First, define labels using the grafana_oncall_label data source:

hcl
data "grafana_oncall_label" "env_label" {
  provider = grafana.oncall
  key      = "environment"
  value    = "production"
}

Then, pass the label into the grafana_oncall_integration resource:

hcl
resource "grafana_oncall_integration" "monitoring" {
  provider = grafana.oncall
  name     = "Production Monitoring"
  type     = "webhook"
  labels   = [data.grafana_oncall_label.env_label]

  default_route {}
}

IRM also supports dynamic labels on integrations. Dynamic labels are extracted from alert payloads at ingestion time rather than being statically assigned. Configure dynamic labels through the IRM UI or the OnCall API.

For more information, refer to Configure labels.

Configure a direct paging integration

hcl
resource "grafana_oncall_integration" "direct_paging" {
  provider = grafana.oncall
  name     = "Engineering Direct Paging"
  type     = "direct_paging"

  default_route {}
}

Note

To manage direct paging integrations through Terraform, enable the Manually manage direct paging integrations setting in Admin settings. Otherwise, IRM automatically creates and manages direct paging integrations for each team.

Apply your configuration

Initialize, preview, and apply your Terraform configuration:

sh
terraform init

Preview the changes:

sh
terraform plan

Apply the configuration:

sh
terraform apply

Service account token limitations

When using Grafana Cloud service account tokens with the OnCall API, be aware of the following limitations:

  • Service accounts can’t retrieve their own user profile (GET /api/v1/users/current/).
  • Service accounts can’t perform state-changing actions on alert groups, such as acknowledge, resolve, or silence. Use a user API token or the IRM UI for these operations.
  • The /info, /make_call, and /send_sms endpoints require a legacy OnCall API key and don’t accept service account tokens.

If you encounter 500 errors when using service account tokens, verify the endpoint supports service account authentication and that the token has the necessary permissions.

Continuous integration

For teams using CI/CD pipelines, consider automating the validation and deployment of your IRM configuration:

  1. Create a separate workspace for each environment (development, staging, production).
  2. Use pull requests to review configuration changes.
  3. Implement automated testing of your Terraform configurations.
  4. Configure CI pipelines to automatically apply changes after approval.

The following GitHub Actions workflow example validates Terraform configurations on every push and pull request:

YAML
name: IRM Infrastructure

on:
  push:
    branches: [main]
    paths:
      - 'terraform/oncall/**'
  pull_request:
    branches: [main]
    paths:
      - 'terraform/oncall/**'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - name: Validate Terraform
        run: |
          cd terraform/oncall
          terraform init -backend=false
          terraform validate

Best practices

  • Store tokens and sensitive values in Terraform variables or a secrets manager.
  • Store state files securely using a remote backend.
  • Use workspaces to manage separate environments (development, staging, production).
  • Implement code review for configuration changes.
  • Test configurations in a non-production environment before applying to production.
  • Use consistent naming conventions across resources.

Next steps