Infrastructure as Code for Grafana IRM
Infrastructure as code (IaC) enables you to automate your Grafana IRM setup and configuration using version-controlled, repeatable processes. This approach helps maintain consistency and simplifies the management of your incident response infrastructure.
Benefits of infrastructure as code
Using infrastructure as code with Grafana IRM provides several key benefits:
- Consistency: Reduce manual errors and enforce standardized configurations
- Version control: Track changes, manage rollbacks, and collaborate effectively
- Scalability: Easily deploy and manage configurations across multiple teams and environments
- Automation: Simplify updates and minimize manual intervention
- Compliance: Maintain audit trails and enforce security policies through code
Supported tools and methods
Grafana IRM supports the following infrastructure as code approaches:
Terraform provider
Use the official Grafana Terraform provider to manage IRM resources such as:
- On-call schedules and rotations
- Escalation chains
- Alert routing and integrations
- Team configurations
API
The Grafana IRM API enables you to build custom workflows and programmatically control your incident response setup. With the API, you can:
- Create and modify on-call schedules and rotations
- Configure team structures and permissions
- Define notification rules and alert routing
- Set up and manage third-party tool integrations
- Automate incident management processes
Before you begin
Before configuring Grafana IRM with infrastructure as code, ensure you have:
- Access to your organization’s Grafana Cloud account
- Familiarity with infrastructure as code concepts
- Appropriate permissions to create and manage IRM resources
- Terraform installed on your system (if using Terraform)
- A Grafana Cloud API key with the necessary permissions
Set up the Terraform provider
Learn how to configure the Grafana Terraform provider to manage your Grafana IRM resources and automate your incident response infrastructure.
To get started with Terraform, follow these steps:
1. Create an API token
- Navigate to your Grafana Cloud instance
- In the main menu, click on IRM
- Go to the Settings tab
- Find the API Tokens section and click Create New Token
- Provide a name and select appropriate permissions
- Save the token securely; you won’t be able to see it again
2. Configure the provider
Add the following configuration to your Terraform files:
terraform {
required_providers {
grafana = {
source = "grafana/grafana"
version = ">= 1.22.0"
}
}
}
provider "grafana" {
alias = "oncall"
oncall_access_token = var.grafana_api_token # Store tokens in variables
}
Example configurations
Define an on-call schedule
resource "grafana_oncall_schedule" "primary" {
name = "Primary OnCall Rotation"
team_id = grafana_oncall_team.engineering.id
time_zone = "UTC"
rotation {
name = "Weekly Rotation"
participants = [grafana_oncall_user.user1.id, grafana_oncall_user.user2.id]
shift_length = 604800 # One week in seconds
start_time = "2024-01-01T08:00:00Z"
}
}
Create an escalation chain
resource "grafana_oncall_escalation_chain" "default" {
name = "Primary Escalation Chain"
step {
order = 0
delay_minutes = 5
participants = [grafana_oncall_user.primary.id]
}
step {
order = 1
delay_minutes = 10
participants = [grafana_oncall_team.engineering.id]
}
}
Configure an integration
resource "grafana_oncall_integration" "direct_paging" {
name = "Engineering Direct Paging"
type = "direct_paging"
team_id = grafana_oncall_team.engineering.id
}
Apply your configuration
Initialize Terraform:
terraform init
Preview changes:
terraform plan
Apply the configuration:
terraform apply
Best practices
- Use variables for reusable values
- Store state files securely
- Use workspaces for different environments
- Implement a review process for changes
- Test configurations in a non-production environment
- Document your infrastructure code
- Use consistent naming conventions
Continuous integration
For teams using CI/CD pipelines, consider implementing automatic validation and deployment of your IRM configuration:
- Create a separate workspace for each environment (dev, staging, production)
- Use pull requests to review configuration changes
- Implement automated testing of your Terraform configurations
- Configure CI pipelines to automatically apply changes after approval
Example CI workflow:
name: IRM Infrastructure
on:
push:
branches: [main]
paths:
- 'terraform/oncall/**'
pull_request:
branches: [main]
paths:
- 'terraform/oncall/**'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: hashicorp/setup-terraform@v2
- name: Validate Terraform
run: |
cd terraform/oncall
terraform init -backend=false
terraform validate