Integrate existing alerts into Asserts
Grafana Cloud

Integrate existing alerts into Asserts

When you use Asserts, it proactively identifies problems and displays signals related to them. These problems can include saturation and software rollouts causing issues, such as increased memory usage and errors. You can use the RCA workbench to investigate these issues.

If you used other Grafana Observability offerings prior to using Asserts, you may have existing alerts that you want to view on the RCA workbench to aid root cause analysis.

In this scenario, we recommend that you duplicate the alert recording rules and then add labels for them to display on the RCA workbench. This method preserves any notifications that you may have set up outside of Asserts.

Labels to add to the alert rule

You must add the following labels to a copy of an existing alert for it to display in Asserts.

  1. asserts_entity_type
  2. asserts_alert_category
  3. asserts_severity

Set the value for these labels based on the answers to the following questions:

QuestionPossible answers
To what entity type would you like to attribute the problem detected by this alert?Node, Service, Pod, ServiceInstance, Volume, Namespace, NodeGroup, KubeCluster, DataSource
What is the category of the problem detected by this alert?
  • saturation: A resource is limited and its usage is approaching the limit. Resources include CPU, memory, disk, number of connections, and so on.
  • error: This category classifies application errors, server errors, client errors, and so on.
  • failure: The system’s current state is in deviation from its desired state. For example, in Kubernetes, the number of replicas is less than the number of configured replicas. Another example is when Redis running in a cluster mode does not have a Master.
What is the severity of the problem?Asserts supports two severity levels: critical and warning
Does the alert expression return a label that has the entity name?
  • Node: node and/or instance
  • Service: service or job
  • ServiceInstance: service or job, and instance
  • Pod: pod
  • Volume: volume
  • Namespace: namespace
Is the alert expression returning the Asserts scope labels: asserts_env, asserts_site, and namespace? If the alert expression has aggregations like sum/min/max/avg by/without(), do these aggregations return the above labels?

If the answer is no, you must change the alert expression to retain these labels.

For example, change sum by(job, instance)(memory_usage) to sum by(asserts_env, asserts_site, namespace, job, instance)(memory_usage).

namespace is mostly relevant for Kubernetes.

Example: Service entity

The following example represents an alert on a Service entity:

output
# Redis Master Missing
# Note this covers both cluster mode and HA mode, thus we are counting by redis_mode

- alert: RedisMissingMaster
   expr: |-
     count by (job, service, redis_mode, namespace, asserts_env, asserts_site) (
       redis_instance_info{asserts_env!="", role="master"}
     ) == 0
  for: 1m
  labels:
     asserts_severity: critical
     asserts_entity_type: Service
     asserts_alert_category: failure

Example: Node entity

The following example represents an alert on a Node entity. In the example, no aggregation is occurring, so the alert has all of the metrics in the source metric and only static labels are added.

output
- alert: HostNetworkReceiveErrors
  expr: rate(node_network_receive_errs_total{asserts_env!=""}[5m]) * 300 > 0
  for: 5m
  labels:
    asserts_severity: warning
    asserts_entity_type: Node
    asserts_alert_category: failure

Before you begin

Before you begin, ensure that you are familiar with Grafana Alerts and the RCA workbench.

Steps

  1. Sign in to Grafana Cloud and click Alerts & IRM > Alerting > Alert rules.
  2. Choose an alert and then select a recording rule to edit.
  3. Click More > Duplicate.
  4. Edit the new recording rule to include the labels as described in the above sections.