---
title: "Alert group insights and metrics | Grafana Cloud documentation"
description: "Monitor and analyze alert group handling metrics and logs in Grafana IRM."
---

> For a curated documentation index, see [llms.txt](/llms.txt). For the complete documentation index, see [llms-full.txt](/llms-full.txt).

# Alert groups insights and metrics

Grafana IRM provides detailed metrics and logs to help you monitor your alert groups handling performance and analyze trends. These insights enable you to identify bottlenecks, measure response effectiveness, and continuously improve your alerting processes.

## About alert groups metrics

Alert groups metrics in Grafana IRM track key performance indicators related to alert groups handling, including:

- Alert groups volume across integrations
- Response times for alert groups acknowledgment
- Notification patterns
- Team and user metrics

These metrics are exposed in Prometheus format, making them easy to query and visualize in Grafana dashboards.

## Available metrics

Grafana IRM provides the following core metrics:

Expand table

| Metric                                    | Type      | Description                                                                                          |
|-------------------------------------------|-----------|------------------------------------------------------------------------------------------------------|
| `alert_groups_total`                      | Gauge     | Total count of alert groups for each integration by state (firing, acknowledged, resolved, silenced) |
| `alert_groups_response_time`              | Histogram | Mean time between alert groups start and first action over the last 7 days                           |
| `alert_groups_resolution_time`            | Histogram | Mean time between alert groups start and resolution over the last 7 days                             |
| `user_was_notified_of_alert_groups_total` | Counter   | Total count of alert groups users were notified of                                                   |

### Access metrics

#### For Grafana Cloud customers

Alert groups metrics are automatically collected in the preinstalled **grafanacloud-usage** data source and have the prefix `grafanacloud_oncall_instance`, for example:

- `grafanacloud_oncall_instance_alert_groups_total`
- `grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket`
- `grafanacloud_oncall_instance_alert_groups_resolution_time_seconds_bucket`
- `grafanacloud_oncall_instance_user_was_notified_of_alert_groups_total`

### Metric details and examples

#### Alert groups total

This metric tracks the count of alert groups in different states with the following labels:

Expand table

| Label          | Description                                                           |
|----------------|-----------------------------------------------------------------------|
| `id`           | ID of Grafana instance (stack)                                        |
| `slug`         | Slug of Grafana instance (stack)                                      |
| `org_id`       | ID of Grafana organization                                            |
| `team`         | Team name                                                             |
| `integration`  | Integration name                                                      |
| `service_name` | Value of alert groups `service_name` label                            |
| `state`        | Alert groups state (`firing`, `acknowledged`, `resolved`, `silenced`) |

**Example query:**

Get the number of alert groups in “firing” state for “Grafana Alerting” integration:

promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```promql
grafanacloud_oncall_instance_alert_groups_total{integration="Grafana Alerting", state="firing"}
```

#### Alert groups response time

This metric tracks response times with the following labels:

Expand table

| Label          | Description                                                            |
|----------------|------------------------------------------------------------------------|
| `id`           | ID of Grafana instance (stack)                                         |
| `slug`         | Slug of Grafana instance (stack)                                       |
| `org_id`       | ID of Grafana organization                                             |
| `team`         | Team name                                                              |
| `integration`  | Integration name                                                       |
| `service_name` | Value of alert groups `service_name` label                             |
| `le`           | Histogram bucket value in seconds (`60`, `300`, `600`, `3600`, `+Inf`) |

**Example query:**

Get the number of alert groups with response time less than 10 minutes (600 seconds):

promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```promql
grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket{integration="Grafana Alerting", le="600"}
```

#### Alert groups resolution time

This metric tracks resolution times with the following labels:

Expand table

| Label          | Description                                                            |
|----------------|------------------------------------------------------------------------|
| `id`           | ID of Grafana instance (stack)                                         |
| `slug`         | Slug of Grafana instance (stack)                                       |
| `org_id`       | ID of Grafana organization                                             |
| `team`         | Team name                                                              |
| `integration`  | Integration name                                                       |
| `service_name` | Value of alert groups `service_name` label                             |
| `le`           | Histogram bucket value in seconds (`60`, `300`, `600`, `3600`, `+Inf`) |

**Example query:**

Get the number of alert groups with resolution time less than 10 minutes (600 seconds):

promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```promql
grafanacloud_oncall_instance_alert_groups_resolution_time_seconds_bucket{integration="Grafana Alerting", le="600"}
```

#### User notification metrics

This metric tracks how many alert groups each user was notified about:

Expand table

| Label      | Description                      |
|------------|----------------------------------|
| `id`       | ID of Grafana instance (stack)   |
| `slug`     | Slug of Grafana instance (stack) |
| `org_id`   | ID of Grafana organization       |
| `username` | User’s username                  |

**Example query:**

Get the number of alert groups a specific user was notified of:

promql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```promql
grafanacloud_oncall_instance_user_was_notified_of_alert_groups_total{username="alex"}
```

## Alert groups metrics dashboard

A pre-built “Alert Groups Insights” dashboard is available to visualize key alert metrics. To access it:

1. Navigate to your dashboards list in the folder `General`
2. Find the dashboard with the tag `irm`
3. Select your Prometheus data source (for Cloud customers, use `grafanacloud_usage`)
4. Filter data by Grafana instances, teams, and integrations

To re-import the dashboard:

1. Go to `Administration` &gt; `Plugins`
2. Find IRM in the plugins list
3. Open the `Dashboards` tab
4. Click “Re-import” next to “Alert Groups Insights”

> Note
> 
> Re-importing or updating the plugin will reset any customizations. To preserve changes, save a copy of the dashboard using “Save As” in dashboard settings.

You can also view insights directly in Grafana IRM by clicking **Insights** in the navigation menu.

## Alert groups insight logs

Alert groups insight logs provide an audit trail of configuration changes and system events in your IRM environment. These logs are automatically configured in Grafana Cloud with the [Usage Insights Loki data source](/docs/grafana-cloud/billing-and-usage/usage-insights/#usage-insights-loki-data-source).

### Access insight logs

To retrieve all logs related to your IRM instance:

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=``
```

### Types of insight logs

IRM captures three primary types of insight logs:

#### Resource logs

Track changes to resources (integrations, escalation chains, schedules, etc.):

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource`
```

Resource logs include the following key fields:

Expand table

| Field           | Description                                            |
|-----------------|--------------------------------------------------------|
| `action_name`   | Type of action (`created`, `updated`, `deleted`)       |
| `action_type`   | Always `resource` for resource logs                    |
| `author`        | Username who performed the action                      |
| `resource_id`   | ID of the modified resource                            |
| `resource_name` | Name of the modified resource                          |
| `resource_type` | Type of resource (integration, escalation chain, etc.) |
| `team`          | Team the resource belongs to                           |
| `prev_state`    | JSON representation of resource before update          |
| `new_state`     | JSON representation of resource after update           |

#### Maintenance logs

Track when maintenance mode is started or finished:

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `maintenance`
```

Maintenance logs include:

Expand table

| Field              | Description                                                                      |
|--------------------|----------------------------------------------------------------------------------|
| `action_name`      | Maintenance action (`started` or `finished`)                                     |
| `action_type`      | Always `maintenance` for maintenance logs                                        |
| `maintenance_mode` | Type of maintenance (`silence_escalations`, `group_alerts`, or `disable_alerts`) |
| `resource_id`      | ID of the integration under maintenance                                          |
| `resource_name`    | Name of the integration under maintenance                                        |
| `team`             | Team the integration belongs to                                                  |
| `author`           | Username who performed the action                                                |

#### ChatOps logs

Track configuration changes to chat integrations:

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `chat_ops`
```

ChatOps logs include:

Expand table

| Field           | Description                                                        |
|-----------------|--------------------------------------------------------------------|
| `action_name`   | Type of chatops action                                             |
| `action_type`   | Always `chat_ops` for chatops logs                                 |
| `author`        | Username who performed the action                                  |
| `chat_ops_type` | Type of integration (`telegram`, `slack`, `msteams`, `mobile_app`) |
| `channel_name`  | Name of the linked channel                                         |
| `linked_user`   | Username linked to the chatops integration                         |

### Example log queries

Here are some practical log queries to analyze your alert handling configuration:

**Actions by specific user:**

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and author="username"
```

**Changes to schedules:**

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and (resource_type=`web_schedule` or resource_type=`calendar_schedule` or resource_type=`ical_schedule`)
```

**Changes to escalation policies:**

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and resource_type=`escalation_policy` and escalation_chain_id=`CHAIN_ID`
```

**Maintenance events for an integration:**

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `maintenance` and resource_id=`INTEGRATION_ID`
```

**Slack chatops configuration changes:**

logql ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `chat_ops` and chat_ops_type=`slack`
```
