Menu

Important: This documentation is about an older version. It's relevant only to the release noted, many of the features and functions have been updated or replaced. Please view the current version.

DocumentationGrafana OnCallInsight Logs and Metrics
Open source

Insight Logs and Metrics

Metrics

Grafana OnCall Metrics represents certain parameters, such as:

  • A total count of alert groups for each integration in every state (firing, acknowledged, resolved, silenced). It is a gauge, and its name has the suffix alert_groups_total
  • Response time on alert groups for each integration (mean time between the start and first action of all alert groups for the last 7 days in selected period). It is a histogram, and its name has the suffix alert_groups_response_time with the histogram suffixes such as _bucket, _sum and _count

You can find more information about metrics types in the Prometheus documentation.

To retrieve Prometheus metrics use PromQL. If you are not familiar with PromQL, check this documentation.

For Grafana Cloud customers

OnCall application metrics are collected in preinstalled grafanacloud_usage datasource and are available for every cloud instance.

Metrics have prefix grafanacloud_oncall_instance, e.g. grafanacloud_oncall_instance_alert_groups_total and grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket.

For open source customers

To collect OnCall application metrics you need to set up Prometheus and add it to your Grafana instance as a datasource. You can find more information about Prometheus setup in the OSS documentation

Metrics will have the prefix oncall, e.g. oncall_alert_groups_total and oncall_alert_groups_response_time_seconds_bucket.

Your metrics may also have additional labels, such as pod, instance, container, depending on your Prometheus setup.

Metric Alert groups total

This metric has the following labels:

Label NameDescription
idID of Grafana instance (stack)
slugSlug of Grafana instance (stack)
org_idID of Grafana organization
teamTeam name
integrationOnCall Integration name
stateAlert groups state. May be firing, acknowledged, resolved and silenced

Query example:

Get the number of alert groups in “firing” state in integration “Grafana Alerting” in Grafana stack “test_stack”:

promql
grafanacloud_oncall_instance_alert_groups_total{slug="test_stack", integration="Grafana Alerting", state="firing"}

Metric Alert groups response time

This metric has the following labels:

Label NameDescription
idID of Grafana instance (stack)
slugSlug of Grafana instance (stack)
org_idID of Grafana organization
teamTeam name
integrationOnCall Integration name
leHistogram bucket value in seconds. May be 60, 300, 600, 3600 and +Inf

Query example:

Get the number of alert groups with response time more than 10 minutes (600 seconds) in integration “Grafana Alerting” in Grafana stack “test_stack”:

promql
grafanacloud_oncall_instance_alert_groups_response_time_seconds_bucket{slug="test_stack", integration="Grafana Alerting", le="600"}

Dashboard

To import OnCall metrics dashboard go to Administration -> Plugins page, find OnCall in the plugins list, open Dashboards tab at the OnCall plugin settings page and click “Import” near “OnCall metrics”. After that you can find the “OnCall metrics” dashboard in your dashboards list. In the datasource dropdown select your Prometheus datasource (for Cloud customers it’s grafanacloud_usage). You can filter data by your Grafana instances, teams and integrations.

To update the dashboard to the newest version go to Dashboards tab at the OnCall plugin settings page and click “Re-import”. Be aware: if you have made changes to the dashboard, they will be deleted after re-importing. To save your changes go to the dashboard settings, click “Save as” and save a copy of the dashboard.

Insight Logs

Note: Grafana OnCall insight logs are available in Grafana Cloud only. We’re in the process of rolling out Insight Logs to all customers, if you don’t see insight logs in your Grafana Cloud stack, please reach out to support.

Grafana OnCall Insights Logs represents certain activities, such as when:

  • A user creates, updates, or deletes a resource.
  • A Maintenance mode is started or finished for an integration.
  • A user configures a ChatOps integration.

This configuration is done for you in Grafana Cloud with Usage Insights Loki data source. You can use this query to retrieve all logs related to your OnCall instance.

logql
{instance_type="oncall"} | logfmt | __error__=``

Resource insight logs

Logs are created each time a user modifies any resource in Grafana OnCall.

These logs will have action_type=resource field and can be retrieved with following query:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource`

Format

Logs contain the following fields, where the fields followed by * are always available, and the others depend on the logged event:

Field NameDescription
action_name*Type of the resource action, which can be created, updated or deleted.
action_type*Insight Log type. For resource insight logs it will be resource.
author*Username of user who performed action.
author_id*ID of user who performed action.
prev_stateJSON representation of resource before update.
new_stateJSON representation of resource after update.
resource_id*ID of target resource.
resource_name*Name of target resource.
resource_type*Type of target resource (See available types below).
team*Name of team to which resource belongs.
team_idID of team to which resource belongs.
integrationName of integration to which resource belongs.
integration_idID of integration to which resource belongs.
escalation_chainName of team to which resource belongs.
escalation_chain_idID of team to which resource belongs.
scheduleName of schedule to which resource belongs.
schedule_idID of schedule to which resource belongs .

resource types are: integration_heartbeat, escalation_chain, integration, outgoing_webhook, escalation_policy, public_api_token, schedule_export_token,user_schedule_export_token, oncall_shift, web_schedule, ical_schedule, calendar_schedule, organization, user, webhook.

Maintenance insight logs

Logs are created every time when a maintenance mode is started or finished for an integration.

These logs will have action_type=maintenace field and can be retrieved with following query:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `maintenance`

Format

Logs of maintenance insights contain the following fields, where the fields followed by * are always available, and the others depend on the logged event:

Field NameDescription
action_name*Name of the maintenance action, which can be started or finised.
action_type*Insight Log type. For Maintenance Insight logs it will be maintenance.
authorUsername of user who performed action.
author_idGrafana OnCall ID of user who performed action.
maintenance_mode*Type of the maintenance, which can be maintenance or debug
resource_id*ID of target integration.
resource_name*Name of target integration.
team*Name of team to which integration belongs.
team_idID of team to which integration belongs.

ChatOps insight logs

Logs are created when user modifies ChatOps settings.

These log lines will have action_type=chat_ops field and can be retrieved with following query:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `chat_ops`

Format

Logs of chatops insight logs contain the following fields, where the fields followed by * are always available, and the others depend on the logged event:

Field NameDescription
action_name*Name of the chatops action (See available names below).
action_type*Insight Log type. For Chatops Insight logs it always will be chat_ops.
author*Username of user who performed action
author_id*Grafana OnCall ID of user who performed action
сhat_ops_type*Type of chatops integration. Can be telegram, slack, msteams, mobile_app
linked_userUsername of user linked to chatops integration
linked_user_idGrafana OnCall ID of user linked to chatops integration
channel_nameName of the channel linked to chatops integration
prev_channelName of team to which resource belongs
new_channelGrafana OnCall ID of team to which resource belongs

chatops action names: workspace_connected, workspace_disconnected, channel_connected, channel_disconnected, user_linked, used_unlinked, default_channel_changed.

Examples

Here is some examples of practical queries to Grafana OnCall insight logs. LogQL is used to retrieve them. If you are not familiar with LogQL check this documentation.

Resource IDs are used a lot in insight logs. You can find them in web ui (example for integration):

  1. Open Grafana OnCall.
  2. Navigate to resource.
  3. The URL looks like https://<YOUR_STACK_SLUG>/a/grafana-oncall-app/integrations/C5VXMIFKKP67K.
  4. Integration ID is C5VXMIFKKP67K.

Alternatively you can find resource ID using public API or browser dev tools.

Actions performed by user:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and author="<username>"

Actions performed with all schedules:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and (resource_type=`web_schedule` or resource_type=`calendar_schedule` or resource_type=`ical_schedule`)

Changes of escalation policies for escalation chain:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `resource` and resource_type=`escalation_policy` and escalation_chain_id=`<ESCALATION_CHAIN_ID>`

Maintenance events for integration:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `maintenance` and resource_id=`CSA67IQW2NMVL`

Actions performed with slack chatops integration:

logql
{instance_type="oncall"} | logfmt | __error__=`` | action_type = `chat_ops` and chat_ops_type=`slack`