Integrate existing alerts into the knowledge graph

When you use the knowledge graph, it proactively identifies problems and displays signals related to them. These problems can include saturation and software rollouts causing issues, such as increased memory usage and errors. You can use RCA workbench to investigate these issues.

If you used other Grafana Observability offerings prior to using the knowledge graph, you may have existing alerts that you want to view in RCA workbench to aid root cause analysis.

In this scenario, we recommend that you duplicate the alert recording rules and then add labels for them to display in RCA workbench. This method preserves any notifications that you may have set up outside of the knowledge graph.

Labels to add to the alert rule

You must add the following labels to a copy of an existing alert for it to display in the knowledge graph.

asserts_entity_type
asserts_alert_category
asserts_severity

Set the value for these labels based on the answers to the following questions:

Question	Possible answers
To what entity type would you like to attribute the problem detected by this alert?	`Node`, `Service`, `Pod`, `ServiceInstance`, `Volume`, `Namespace`, `NodeGroup`, `KubeCluster`, `DataSource`
What is the category of the problem detected by this alert?	saturation: A resource is limited and its usage is approaching the limit. Resources include CPU, memory, disk, number of connections, and so on. error: This category classifies application errors, server errors, client errors, and so on. failure: The current state of the system is in deviation from its desired state. For example, in Kubernetes, the number of replicas is less than the number of configured replicas. Another example is when Redis running in a cluster mode does not have a Master.
What is the severity of the problem?	The knowledge graph supports two severity levels: `critical` and `warning`
Does the alert expression return a label that has the entity name?	Node: node and/or instance Service: service or job ServiceInstance: service or job, and instance Pod: pod Volume: volume Namespace: namespace
Is the alert expression returning the knowledge graph scope labels: `asserts_env`, `asserts_site`, and `namespace`? If the alert expression has aggregations like `sum/min/max/avg by/without()`, do these aggregations return the above labels?	If the answer is no, you must change the alert expression to retain these labels. For example, change `sum by(job, instance)(memory_usage)` to `sum by(asserts_env, asserts_site, namespace, job, instance)(memory_usage)`. `namespace` is mostly relevant for Kubernetes.

Example: Service entity

The following example represents an alert on a Service entity:

# Redis Master Missing
# Note this covers both cluster mode and HA mode, thus we are counting by redis_mode

- alert: RedisMissingMaster
   expr: |-
     count by (job, service, redis_mode, namespace, asserts_env, asserts_site) (
       redis_instance_info{asserts_env!="", role="master"}
     ) == 0
  for: 1m
  labels:
     asserts_severity: critical
     asserts_entity_type: Service
     asserts_alert_category: failure

Example: Node entity

The following example represents an alert on a Node entity. In the example, no aggregation is occurring, so the alert has all of the metrics in the source metric and only static labels are added.

- alert: HostNetworkReceiveErrors
  expr: rate(node_network_receive_errs_total{asserts_env!=""}[5m]) * 300 > 0
  for: 5m
  labels:
    asserts_severity: warning
    asserts_entity_type: Node
    asserts_alert_category: failure

Before you begin

Before you begin, ensure that you are familiar with Grafana Alerts and RCA workbench.

Steps

Sign in to Grafana Cloud and click Alerts & IRM > Alerting > Alert rules.
Choose an alert and then select a recording rule to edit.
Click More > Duplicate.
Edit the new recording rule to include the labels as described in the above sections.