Alertmanager integration for Grafana OnCall

Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

Open source

Alertmanager integration for Grafana OnCall

Note

⚠️ Legacy integration ⚠️ Integrations that were created before version 1.3.21 (1 August 2023) were marked as (Legacy) and migrated. These integrations are receiving and escalating alerts, but some manual adjustments may be required.

The Alertmanager integration handles alerts from Prometheus Alertmanager. This integration is the recommended way to send alerts from Prometheus deployed in your infrastructure, to Grafana OnCall.

Pro tip: Create one integration per team, and configure alertmanager labels selector to send alerts only related to that team

Configuring Grafana OnCall to Receive Alerts from Prometheus Alertmanager

  1. In the Integrations tab, click + New integration.
  2. Select Alertmanager Prometheus from the list of available integrations.
  3. Enter a name and description for the integration, click Create
  4. A new page will open with the integration details. Copy the OnCall Integration URL from HTTP Endpoint section. You will need it when configuring Alertmanager.

Configuring Alertmanager to Send Alerts to Grafana OnCall

  1. Add a new Webhook receiver to receivers section of your Alertmanager configuration
  2. Set url to the OnCall Integration URL from previous section
    • Note: The url has a trailing slash that is required for it to work properly.
  3. Set send_resolved to true, so Grafana OnCall can autoresolve alert groups when they are resolved in Alertmanager
  4. It is recommended to set max_alerts to less than 100 to avoid requests that are too large.
  5. Use this receiver in your route configuration

Here is the example of final configuration:

yaml
route:
  receiver: "oncall"
  group_by: [alertname, datacenter, app]

receivers:
  - name: "oncall"
    webhook_configs:
      - url: <integration-url>
        send_resolved: true
        max_alerts: 100

Complete the Integration Configuration

Complete configuration by setting routes, templates, maintenances, etc. Read more in this section

Note about grouping and autoresolution

Grafana OnCall relies on the Alertmanager grouping and autoresolution mechanism to ensure consistency between alert state in OnCall and AlertManager. It’s recommended to configure grouping on the Alertmanager side and use default grouping and autoresolution templates on the OnCall side. Changing this templates might lead to incorrect grouping and autoresolution behavior. This is unlikely to be what you want, unless you have disabled grouping on the AlertManager side.

Configuring OnCall Heartbeats (optional)

An OnCall heartbeat acts as a monitoring for monitoring systems. If your monitoring is down and stop sending alerts, Grafana OnCall will notify you about that.

Configuring Grafana OnCall Heartbeat

  1. Go to Integration Page, click on three dots on top right, click Heartbeat settings
  2. Copy OnCall Heartbeat URL, you will need it when configuring Alertmanager
  3. Set up Heartbeat Interval, time period after which Grafana OnCall will start a new alert group if it doesn’t receive a heartbeat request

Configuring Alertmanager to send heartbeats to Grafana OnCall Heartbeat

You can configure Alertmanager to regularly send alerts to the heartbeat endpoint. Add vector(1) as a heartbeat generator to prometheus.yaml. It will always return true and act like always firing alert, which will be sent to Grafana OnCall once in a given period of time:

yaml
groups:
  - name: meta
    rules:
      - alert: heartbeat
        expr: vector(1)
        labels:
          severity: none
        annotations:
          description: This is a heartbeat alert for Grafana OnCall
          summary: Heartbeat for Grafana OnCall

Add receiver configuration to prometheus.yaml with the OnCall Heartbeat URL:

yaml
  ...
  route:
  ...
      routes:
        - match:
          alertname: heartbeat
        receiver: 'grafana-oncall-heartbeat'
        group_wait: 0s
        group_interval: 1m
        repeat_interval: 50s
  receivers:
    - name: 'grafana-oncall-heartbeat'
    webhook_configs:
      - url: https://oncall-dev-us-central-0.grafana.net/oncall/integrations/v1/alertmanager/1234567890/heartbeat/
        send_resolved: false

Note about legacy integration

Legacy integration was using each alert from AlertManager group as a separate payload:

json
{
  "labels": {
    "severity": "critical",
    "alertname": "InstanceDown"
  },
  "annotations": {
    "title": "Instance localhost:8081 down",
    "description": "Node has been down for more than 1 minute"
  },
  ...
}

This behaviour was leading to mismatch in alert state between OnCall and AlertManager and draining of rate-limits, since each AlertManager alert was counted separately.

We decided to change this behaviour to respect AlertManager grouping by using AlertManager group as one payload.

json
{
  "alerts": [...],
  "groupLabels": {
    "alertname": "InstanceDown"
  },
  "commonLabels": {
    "job": "node", 
    "alertname": "InstanceDown"
  },
  "commonAnnotations": {
    "description": "Node has been down for more than 1 minute"
  },
  "groupKey": "{}:{alertname=\"InstanceDown\"}",
  ...
}

You can read more about AlertManager Data model here.

After-migration checklist

Integration URL will stay the same, so no need to change AlertManager or Grafana Alerting configuration. Integration templates will be reset to suit new payload. It is needed to adjust routes and outgoing webhooks manually to new payload.

  1. Send a new demo alert to the migrated integration.
  2. Adjust routes to the new shape of payload. You can use payload of the demo alert from previous step as an example.
  3. If outgoing webhooks utilized the alerts payload from the migrated integration in the trigger or data template it’s needed to adjust them as well.
Screenshot 2023-12-14 at 1 14 21 PM Screenshot 2023-12-14 at 1 14 32 PM