Integrate existing alerts into the knowledge graph
When you use the knowledge graph, it proactively identifies problems and displays signals related to them. These problems can include saturation and software rollouts causing issues, such as increased memory usage and errors. You can use RCA workbench to investigate these issues.
If you used other Grafana Observability offerings prior to using the knowledge graph, you may have existing alerts that you want to view in RCA workbench to aid root cause analysis.
In this scenario, we recommend that you duplicate the alert recording rules and then add labels for them to display in RCA workbench. This method preserves any notifications that you may have set up outside of the knowledge graph.
Labels to add to the alert rule
You must add the following labels to a copy of an existing alert for it to display in the knowledge graph.
asserts_entity_type
asserts_alert_category
asserts_severity
Set the value for these labels based on the answers to the following questions:
Question | Possible answers |
---|---|
To what entity type would you like to attribute the problem detected by this alert? | Node , Service , Pod , ServiceInstance , Volume , Namespace , NodeGroup , KubeCluster , DataSource |
What is the category of the problem detected by this alert? |
|
What is the severity of the problem? | The knowledge graph supports two severity levels: critical and warning |
Does the alert expression return a label that has the entity name? |
|
Is the alert expression returning the knowledge graph scope labels: asserts_env , asserts_site , and namespace ? If the alert expression has aggregations like sum/min/max/avg by/without() , do these aggregations return the above labels? | If the answer is no, you must change the alert expression to retain these labels. For example, change
|
Example: Service entity
The following example represents an alert on a Service entity:
# Redis Master Missing
# Note this covers both cluster mode and HA mode, thus we are counting by redis_mode
- alert: RedisMissingMaster
expr: |-
count by (job, service, redis_mode, namespace, asserts_env, asserts_site) (
redis_instance_info{asserts_env!="", role="master"}
) == 0
for: 1m
labels:
asserts_severity: critical
asserts_entity_type: Service
asserts_alert_category: failure
Example: Node entity
The following example represents an alert on a Node entity. In the example, no aggregation is occurring, so the alert has all of the metrics in the source metric and only static labels are added.
- alert: HostNetworkReceiveErrors
expr: rate(node_network_receive_errs_total{asserts_env!=""}[5m]) * 300 > 0
for: 5m
labels:
asserts_severity: warning
asserts_entity_type: Node
asserts_alert_category: failure
Before you begin
Before you begin, ensure that you are familiar with Grafana Alerts and RCA workbench.
Steps
- Sign in to Grafana Cloud and click Alerts & IRM > Alerting > Alert rules.
- Choose an alert and then select a recording rule to edit.
- Click More > Duplicate.
- Edit the new recording rule to include the labels as described in the above sections.