Escalation Chains and Routes
Often alerts from monitoring systems need to be sent to different escalation chains and messaging channels, based on their severity, or other alert content.
Routes
Routes are used to determine which escalation chain should be used for a specific alert group. A route’s [Routing Templates] are evaluated for each alert and the first matching route is used to determine the escalation chain and chatops channels.
Example:
- trigger escalation chain called
Database Critical
for alerts with{{ payload.severity == "critical" and payload.service == "database" }}
in the payload- create a different route for alerts with the payload
{{ "synthetic-monitoring-dev-" in payload.namespace }}
and select a escalation chain calledSecurity
.
Manage routes
- Open Integration page
- Click Add route button to create a new route
- Click Edit button to edit
Routing Template
. The routing template must evaluate toTrue
for it to apply - Select channels in Publish to Chatops section
Note: If the Publish to Chatops section doesn’t exist, connect Chatops integrations first. For more information, refer to [Notify people].
- Select Escalation Chain from the list
- If Escalation Chain does not exist, click Add new escalation chain button to create a new one, it will open in a new tab.
- Once created, Reload list, and select the new escalation chain
- Click Arrow Up and Arrow Down on the right to change the order of routes
- Click Three dots and Delete Route to delete the route
Routing based on labels
Note: Labels are currently available only in cloud.
In addition, there is a labels
variable available to your routing templates, which contains all of the labels assigned
to the Alert Group, as a dict
. This allows you to route based on labels (or a mix of labels and/or payload based data):
Example:
{{ labels.foo == "bar" or "hello" in labels.keys() or payload.severity == "critical" }}
Escalation Chains
Once an alert group is created and assigned to the route with escalation chain, the escalation chain will be executed. Until user performs an action, which stops the escalation chain (e.g. acknowledge, resolve, silence etc), the escalation chain will continue to execute.
Users can create escalation chains to configure different type of escalation workflows. For example, you can create a chain that will notify on-call users with high priority, and another chain that will only send a message into a Slack channel.
Escalation chains determine Who and When to notify. How to notify is set by the user, based on their own preferences.
Types of escalation steps
Wait
- wait for a specified amount of time before proceeding to the next step. If you need a larger time interval, use multiple wait steps in a row.Notify users
- send a notification to a user or a group of users.Notify users from on-call schedule
- send a notification to a user or a group of users from an on-call schedule.Notify all users from a team
- send a notification to all users in a team.Resolve incident automatically
- resolve the alert group right now with statusResolved automatically
.Escalate to all Slack channel members
- send a notification to the users in the slack channel. These users will be notified via the method configured in their user profile.Notify Slack User Group
- send a notification to each member of a slack user group. These users will be notified via the method configured in their user profile.Trigger outgoing webhook
- trigger an [outgoing webhook].Notify users one by one (round robin)
- notify users sequentially, cycling through users for different alert groups. Example: if users A, B, and C are in the list, the first alert group notifies A, the second alert group notifies B, and the third alert group notifies C. Note: users are sorted alphabetically by their username. To notify multiple users within the same alert group until someone acknowledges, instead useNotify users
policies withWait
policies between them in the escalation chain.Continue escalation if current time is in range
- continue escalation only if current time is in specified range. It will wait for the specfied time to continue escalation. Useful when you want to get escalation only during working hoursContinue escalation if >X alerts per Y minutes (beta)
- continue escalation only if it passes some thresholdRepeat escalation from beginning (5 times max)
- loop the escalation chain
Note: Both “Escalate to all Slack channel members” and “Notify Slack User Group” will filter OnCall registered users matching the users in the Slack channel or Slack User Group with their profiles linked to their Slack accounts (ie. users should have linked their Slack and OnCall users). In both cases, the filtered users satisfying the criteria above are notified following their respective notification policies. However, to avoid spamming the Slack channel/thread, users won’t be notified in the alert group Slack thread (this is how the feature is currently implemented) but instead notify them using their other defined options in their respective policies.
Notification types
Each escalation step that notifies a user, does so by triggering their personal notification steps. These are configured in the Grafana OnCall users page (by clicking “View my profile”). It will be executed for each user in the escalation step User can configure two types of personal notification chains:
Default Notifications
Important Notifications
In the escalation step, user can select which type of notification to use. For more information, refer to [Notify people].
Manage Escalation Chains
Open Escalation Chains page
Click New escalation chain button to create a new escalation chain
Enter a name and assign it to a team
Note: Name must be unique across organization Note: Alert Groups inherit the team from the Integration, not the Escalation Chain
Click Add escalation step button to add a new step
Click Delete to delete the Escalation Chain, and Edit to edit the name or the team.
Important: Linked Integrations and Routes are displayed in the right panel. Any change in the Escalation Chain will affect all linked Integrations and Routes.