Grafana Cloud

Best practices for integrations

Integrations connect Grafana IRM to your monitoring tools, communication platforms, and other external systems. This topic organizes integrations by their purpose to help you understand how each type fits into your IRM setup.

Alert source integrations

Alert source integrations bring alerts into IRM from your monitoring tools. These are the entry points for your alert data.

Grafana Alerting

The native integration for Grafana Managed Alerts provides the tightest integration with the Grafana ecosystem.

Best practices:

  • Use Direct Routing for simple setups where alerts go to a single team.
  • Use Notification Policies for complex routing across multiple teams.
  • Keep the default groupKey for grouping and configure grouping in Grafana Alerting instead.
  • Labels flow automatically from alert rules into IRM.

Alertmanager

For Prometheus Alertmanager and compatible tools like Mimir or Cortex.

Best practices:

  • Configure routing rules in Alertmanager to send alerts to the IRM receiver.
  • Use groupKey for consistent grouping between Alertmanager and IRM.
  • Enable Autoresolve (called allow_source_based_resolving in the API/Terraform) so IRM resolves alerts when Alertmanager sends resolve signals.
  • Labels from Alertmanager flow into alert groups automatically.

For more information, refer to Configure Alertmanager.

Webhook integrations

Generic webhooks accept alerts from any system that can send HTTP requests.

Use webhooks for:

  • Any third-party service without dedicated integrations.
  • Custom integrations or processes.
  • Internal alerting systems.

Configuration tips:

  • Use the Simple Webhook for maximum flexibility.
  • Use Formatted Webhook when you need structured payloads.
  • Configure templates to control grouping, resolution, and label extraction. Refer to Integration templates for details.

Communication integrations

Communication integrations connect IRM to the platforms where your team collaborates and receives notifications, such as Slack or Microsoft Teams.

For example, the Slack integration enables chat-based incident response, bringing alerts and incident coordination into your team’s workspace.

Setup recommendations:

  • Install the Slack app via OAuth for full functionality.
  • Configure default channels for alert notifications.
  • Enable slash commands for quick actions like acknowledging or resolving alerts.
  • Set up user group sync for @mentions.

Channel strategy:

  • Use dedicated channels per team or service.
  • Consider severity-based channels to separate critical alerts from warnings.
  • Enable incident channel creation for major incidents that need focused coordination.

External system integrations

External system integrations sync IRM with ticketing, collaboration, and other operational tools. These keep your systems of record in sync with IRM activity.

Bi-directional webhooks

Bi-directional webhooks enable two-way sync between IRM and external systems like Jira, ServiceNow, or GitHub.

Use cases:

  • Create Jira tickets automatically when alert groups are created.
  • Sync incident status with ServiceNow.
  • Update GitHub issues based on IRM actions.

How bi-directional sync works

Outgoing webhooks send data from IRM to external systems:

  1. An IRM event triggers, such as alert created, acknowledged, or resolved.
  2. The webhook sends data to the external system.
  3. The response can store external IDs on the alert group or incident.

Incoming webhooks receive updates from external systems:

  1. The external system sends an update to IRM.
  2. IRM matches the update to an existing alert group or incident.
  3. IRM updates the status, labels, or timeline accordingly.

Example flows

Alert group flow:

text
1. Alert Group Created
   → Outgoing webhook creates external issue
   → Response stores issue ID on Alert Group

2. Alert Group Acknowledged
   → Outgoing webhook updates external issue status

3. External Issue Closed
   → Incoming webhook resolves Alert Group

Incident flow:

text
1. Incident Declared
   → Outgoing webhook creates external issue
   → Response stores issue ID on Incident

2. Incident Severity Changed
   → Outgoing webhook updates external issue priority

3. External Issue Comment Added
   → Incoming webhook adds activity to Incident timeline

Implementation tips

  • Store external IDs: Use webhook responses to save external system IDs for matching.
  • Use templates: Customize webhook payloads with Jinja2 templates.
  • Handle errors: Configure retry logic for transient failures.
  • Map statuses: Define how external statuses map to IRM states.

Integration configuration

These settings apply across integrations and control how IRM processes alerts.

Integration templates

Templates control how IRM groups, resolves, and labels alerts from each integration.

Grouping template:

Configure the Grouping template (called grouping_id_template in the API/Terraform) to control how alerts combine into alert groups:

jinja2
{{ payload.commonLabels.service_name }}:{{ payload.commonLabels.alertname }}

Resolution template:

Configure the Resolve condition template (called resolve_condition_template in the API/Terraform) for automatic resolution:

jinja2
{{ payload.status == "resolved" }}

For webhook integrations with different payload formats:

jinja2
{{ payload.get("state", "").upper() == "OK" }}

Label extraction template:

Use the Alert group labels template (called alert_group_labels_template in the API/Terraform) to extract labels from alert payloads:

jinja2
service_name={{ payload.commonLabels.service_name or "unknown" }}
severity={{ payload.commonLabels.severity or "warning" }}

Heartbeat monitoring

Heartbeat monitoring detects when expected alerts stop arriving, which often indicates a problem with your monitoring pipeline.

Use heartbeat for:

  • Critical integrations that should always send alerts.
  • Monitoring systems that continuously report health.
  • Any integration where silence indicates a problem.

Configuration:

  1. Enable heartbeat on the integration.
  2. Set an appropriate timeout between 1 minute and 24 hours.
  3. Configure your monitoring system to send regular requests to the heartbeat URL.

Behavior:

  • IRM expects regular HTTP requests to the heartbeat endpoint.
  • If no request arrives within the timeout, IRM creates an alert.
  • When heartbeat resumes, the alert auto-resolves.

Tip

Set the timeout longer than your monitoring interval to avoid false alerts.

Maintenance mode

Maintenance mode temporarily modifies how IRM handles alerts during planned work or testing.

ModeBehaviorUse case
MaintenanceCollects alerts in a single incident without escalationPlanned maintenance windows
DebugSilences all escalations but still creates alertsTesting integrations or templates

Best practices:

  • Always set a duration to prevent indefinite maintenance state.
  • Use Maintenance mode for planned outages.
  • Use Debug mode for testing integrations or templates.
  • Notify your team when entering maintenance mode.

Workflow:

  1. Enable maintenance mode with an appropriate duration.
  2. Perform maintenance work.
  3. Maintenance auto-disables after the duration.
  4. Review accumulated alerts in the maintenance incident.

Direct paging

Direct paging lets you manually create alerts without an external trigger. Use this when you need to engage responders for issues discovered outside your monitoring systems.

When to use direct paging

  • Escalate issues discovered through customer reports or manual investigation.
  • Page specific teams or users for coordination.
  • Create incidents for events not detected by automated alerts.

Best practices

  • Team paging: Page a team to engage whoever is currently on-call.
  • User paging: Page specific individuals when you know who should respond.
  • Include context: Always provide a clear title and message describing the issue.
  • Add labels: Include relevant labels for routing and analytics.

Managing direct paging integrations

Each team has a dedicated direct paging integration that IRM auto-creates.

To manage these integrations through Terraform instead, enable Manually manage direct paging integrations in organization settings. In the API/Terraform, this setting is called manually_manage_direct_paging_integrations.

This setting:

  • Prevents auto-creation of direct paging integrations.
  • Lets you manage integrations entirely through infrastructure as code.

Best practices summary

  • Match integration to purpose: Choose alert source, communication, or external system integrations based on your needs.
  • Configure templates: Customize grouping, resolution, and labels per integration.
  • Use heartbeat monitoring: For critical integrations that should always send alerts.
  • Plan for maintenance: Use maintenance mode during planned outages.
  • Enable bi-directional sync: Keep external ticketing systems in sync with IRM.
  • Test thoroughly: Verify integrations with test payloads before production.

Next steps