Infrastructure as Code
Grafana Cloud RSS

Infrastructure as Code for Grafana IRM

Infrastructure as code (IaC) enables you to automate your Grafana IRM setup and configuration using version-controlled, repeatable processes. This approach helps maintain consistency and simplifies the management of your incident response infrastructure.

Benefits of infrastructure as code

Using infrastructure as code with Grafana IRM provides several key benefits:

  • Consistency: Reduce manual errors and enforce standardized configurations
  • Version control: Track changes, manage rollbacks, and collaborate effectively
  • Scalability: Easily deploy and manage configurations across multiple teams and environments
  • Automation: Simplify updates and minimize manual intervention
  • Compliance: Maintain audit trails and enforce security policies through code

Supported tools and methods

Grafana IRM supports the following infrastructure as code approaches:

Terraform provider

Use the official Grafana Terraform provider to manage IRM resources such as:

  • On-call schedules and rotations
  • Escalation chains
  • Alert routing and integrations
  • Team configurations

API

The Grafana IRM API enables you to build custom workflows and programmatically control your incident response setup. With the API, you can:

  • Create and modify on-call schedules and rotations
  • Configure team structures and permissions
  • Define notification rules and alert routing
  • Set up and manage third-party tool integrations
  • Automate incident management processes

Before you begin

Before configuring Grafana IRM with infrastructure as code, ensure you have:

  • Access to your organization’s Grafana Cloud account
  • Familiarity with infrastructure as code concepts
  • Appropriate permissions to create and manage IRM resources
  • Terraform installed on your system (if using Terraform)
  • A Grafana Cloud API key with the necessary permissions

Set up the Terraform provider

Learn how to configure the Grafana Terraform provider to manage your Grafana IRM resources and automate your incident response infrastructure.

To get started with Terraform, follow these steps:

1. Create an API token

  1. Navigate to your Grafana Cloud instance
  2. In the main menu, click on IRM
  3. Go to the Settings tab
  4. Find the API Tokens section and click Create New Token
  5. Provide a name and select appropriate permissions
  6. Save the token securely; you won’t be able to see it again

2. Configure the provider

Add the following configuration to your Terraform files:

hcl
terraform {
  required_providers {
    grafana = {
      source  = "grafana/grafana"
      version = ">= 1.22.0"
    }
  }
}

provider "grafana" {
  alias                = "oncall"
  oncall_access_token = var.grafana_api_token  # Store tokens in variables
}

Example configurations

Define an on-call schedule

hcl
resource "grafana_oncall_schedule" "primary" {
  name      = "Primary OnCall Rotation"
  team_id   = grafana_oncall_team.engineering.id
  time_zone = "UTC"

  rotation {
    name         = "Weekly Rotation"
    participants = [grafana_oncall_user.user1.id, grafana_oncall_user.user2.id]
    shift_length = 604800  # One week in seconds
    start_time   = "2024-01-01T08:00:00Z"
  }
}

Create an escalation chain

hcl
resource "grafana_oncall_escalation_chain" "default" {
  name = "Primary Escalation Chain"

  step {
    order          = 0
    delay_minutes  = 5
    participants   = [grafana_oncall_user.primary.id]
  }

  step {
    order          = 1
    delay_minutes  = 10
    participants   = [grafana_oncall_team.engineering.id]
  }
}

Configure an integration

hcl
resource "grafana_oncall_integration" "direct_paging" {
  name    = "Engineering Direct Paging"
  type    = "direct_paging"
  team_id = grafana_oncall_team.engineering.id
}

Apply your configuration

  1. Initialize Terraform:

    bash
    terraform init
  2. Preview changes:

    bash
    terraform plan
  3. Apply the configuration:

    bash
    terraform apply

Best practices

  • Use variables for reusable values
  • Store state files securely
  • Use workspaces for different environments
  • Implement a review process for changes
  • Test configurations in a non-production environment
  • Document your infrastructure code
  • Use consistent naming conventions

Continuous integration

For teams using CI/CD pipelines, consider implementing automatic validation and deployment of your IRM configuration:

  1. Create a separate workspace for each environment (dev, staging, production)
  2. Use pull requests to review configuration changes
  3. Implement automated testing of your Terraform configurations
  4. Configure CI pipelines to automatically apply changes after approval

Example CI workflow:

yaml
name: IRM Infrastructure

on:
  push:
    branches: [main]
    paths:
      - 'terraform/oncall/**'
  pull_request:
    branches: [main]
    paths:
      - 'terraform/oncall/**'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: hashicorp/setup-terraform@v2
      - name: Validate Terraform
        run: |
          cd terraform/oncall
          terraform init -backend=false
          terraform validate

Additional resources