Grafana Cloud Monitor public endpoints Synthetic Monitoring alerting
Grafana Cloud

Synthetic Monitoring alerting

Synthetic Monitoring integrates with Grafana Cloud alerting via Alertmanager to provide alerts. The synthetic monitoring plugin provides some default alerting rules. These rules evaluate metrics published by the probes into your cloud Prometheus instance. Firing alert rules can be routed to notification receivers configured in Grafana Cloud alerting.

The default alerting rules that we provide are:

  • HighSensitivity: If 5% of probes fail for 5 minutes, then fire an alert [via the routing that you have set up]
  • MedSensitivity: If 10% of probes fail for 5 minutes, then fire an alert [via the routing that you have set up]
  • LowSensitivity: If 25% of probes fail for 5 minutes, then fire an alert [via the routing that you have set up]

How to create an alert for a Synthetic Monitoring check

Alerts can be created as part of creating or editing a check. You must be logged in to a Grafana Cloud instance in order to create or edit alerts. Alerting in Synthetic Monitoring happens in two phases: configuring a check to publish an alert sensitivity metric label value, and configuring alert rules.

To configure a check to publish the alert sensitivity metric label value:

  1. Navigate to Observability > Synthetics > Checks.

  2. Click New Check to create a new check, or edit a preexisting check in the list.

  3. Click the Alerting section to show the alerting fields.

  4. Select a sensitivity level to associate with the check and click Save.

    This sensitivity value is published to the alert_sensitivity label on the sm_check_info metric each time the check runs on a probe. That label value is utilized by the default alerts to scope which checks to fire alerts for.

To configure alert rules:

  1. Navigate to Observability > Synthetics > Alerts.
  2. If you have no default rules set up already for Synthetic Monitoring, click the Populate default alerts button.
  3. Some default rules will be generated for you. These rules represent sensitivity “buckets” based on probe success percentage. Checks that have been marked with a sensitivity level and whose success percentage drops below the threshold will cause the rule to fire. Checks that have a sensitivity level of “none” will not cause any of the default rules to fire.

How the default alert rules work under the hood

When the default rules generation results in 4 rules: one for each sensitivity level, and a recording rule. The recording rule evaluates the success rate of the check and looks at whether the alert_sensitivity label has a value or not. If alert_sensitivity is defined, the whole expression results in a value that gets recorded, otherwise it’s ignored.

The remaining three alerting rules use the value created by the recording rule, look at the predefined values for the alert_sensitivty label, and map those to thresholds. The thresholds are editable, but the predefined values are not. For example, if a check has the alert_sensitivy=high then its success rate will be evaluated and compared to the default threshold (in this case 95%). The threshold of 95% can be edited, but value of alert_sensitivity has to be either none, low, medium, or high in order for the check to utilize the default alert rules. Users can add other alert_sensitiviy values if they like, the recording rule still produces a result, but there won’t be a predefined alert rule for that label value. This allows the user to create custom rules, possibly with custom thresholds. They can use alert manager to route based on the labels.

How alerts are evaluated

Once a check has been configured to publish an alert sensitivity metric label value, and alert rules have been configured, the alert rules will be evaluated each time the check runs on a probe. If the success rate of a check drops below the threshold for the sensitivity level associated with the check, the alert rule will enter pending state. The duration of the pending state is editable, but defaults to 5 minutes. If the success rates stays below the threshold for the configured amount of time, the rule will go into a firing state. You can read more about alert status in the Grafana Alerting docs.

How to edit an alert for a check

Alerts can be edited in Synthetic Monitoring on the alerts page, or in the Cloud Alerting UI

Note: It’s possible that substantially editing an alert rule in the Cloud Alerting UI will cause it to no longer be editable in the Synthetic Monitoring UI. In that case, the alert rule will only be editable from Grafana Cloud alerting. For example, if you edit the value “0.9” to be “0.75”, this change will propagate back to the synthetic monitoring alerts tab, and the alert will fire according to your edit. However, if you edit the value “0.9” to be “steve”, the alert will be invalid and no longer editable in the UI in the synthetic monitoring alerts tab.

Synthetic Monitoring alerts in Grafana Cloud alerts page

How to set up routing for default alerts

Default alerts contain only the alert rules. Without setting up routing, these alerts will not be routed to any notification receiver, so won’t notify anybody when they fire. You must set up routing in Alertmanager within Grafana Cloud alerting.

Feel free to write your own configuration for routing in the text box editor. You may set up routing to places such as email addresses, Slack, PagerDuty, OpsGenie, and so on. Step by step instructions can be found in this blog post. In order to route the default synthetic monitoring alerts to a notification receiver, you can set up the conditions to match on the namespace and alert_sensitivity labels.

  receiver: <your notification receiver>
      namespace: synthetic_monitoring
      alert_sensitivity: high

If you do not already have an SMTP server available for sending email alerts, see Grafana Alerting for information about how to use one supplied by Grafana Labs.

Where to access Synthetic Monitoring alerts from Grafana Cloud alerting

Alert rules can be found in the synthetic_monitoring namespace of Grafana Cloud alerting. Default rules will be created inside the default rule group.

Recommendation to avoid alert-flapping

When enabling alerting for a check, we recommend running that check from multiple locations, preferably three or more. That way, if there’s a problem either with a single probe or with the network connectivity from that single location, you won’t be needlessly alerted, as the other locations running the same check will continue to report their results alongside the problematic location.

Grafana Alerting

See Grafana Alerting docs for details.

Next steps

Checkout Top 5 user-requested synthetic monitoring alerts in Grafana Cloud, and Best practices for alerting on Synthetic Monitoring metrics in Grafana Cloud blogposts for learn more.