Create and manage Asserts SLOs
Asserts Service Level Objectives (SLO) provides a framework for measuring the quality of service you provide to users. Use SLOs to collect data on the reliability of your systems over time and as a result, help engineering teams reduce alert fatigue, focus on reliability, and provide better service to your customers. For more information about SLOs, refer to Introduction to Grafana SLO.
This topic describes the types of SLOs you can create, and how to create and manage Asserts SLOs.
Tip
The SLO builder is interactive and displays performance data while you are creating the SLO. Take advantage of the interactivity to fine-tune your SLO by experimenting with different time ranges, targets, and thresholds.
SLO types
You can select from among two types of Asserts SLOs:
- Availability: Measures the percent of time that a service is available. The system calculates availability as:
(Number of All Events Query/Number of Bad Events Query) * 100
. This value is then compared against the target percent that you define to determine if you are meeting your SLO. - Latency/Occurrence: Measures how responsive your system is. This type of SLO is based on a single measurement rather than a ratio of measurements. Latency/occurrence SLOs notify you when the time it takes to complete transactions is greater than the threshold you set.
Simple and advanced SLOs
You can define a simple or an advanced SLO.
- Simple: The PromQL expression that defines the SLO builds automatically as you select values from the drop-down menu.
- Advanced: You have the freedom to manually define a PromQL expression that can include metrics other than Asserts metrics.
Before you begin
To create an Asserts SLO, you need to:
- Configure Asserts and have metrics flowing into Grafana Cloud
- Possess knowledge of and have experience with defining SLOs, SLIs, SLAs, and error budgets
- Have an understanding of PromQL
Create a simple SLO
To create a simple SLO, complete the following steps.
Sign in to Grafana and select Asserts > Asserts SLO.
Click Add SLO.
In the Basics section, select Availability or Latency/Occurrence.
Enter an SLO name.
Use the following table to complete the fields in the Service and APIs section.
In the RCA workbench section, enter a search expression.
The search expression defines which entities populate the RCA workbench when you troubleshoot an SLO.
This value automatically populates based on the job you selected. You can override this value by selecting another search expression.
Use the following table to complete the fields in the Target Objectives section.
Click Add new SLO.
Create an advanced availability SLO
To create an advanced availability SLO, complete the following steps.
Sign in to Grafana and select Asserts > Asserts SLO.
Click Add SLO.
Click the Advanced tab.
Click Availability.
Enter an SLO name.
Use the following table to complete the fields in the Service and APIs section.
In the RCA workbench section, enter a search expression.
The search expression defines which entities populate the RCA workbench when you troubleshoot an SLO.
This value automatically populates based on the job identified in the query. You can override this value by selecting another search expression.
Use the following table to complete the fields in the Target Objectives section.
Click Add new SLO.
Create an advanced latency/occurrence SLO
To create an advanced SLO, complete the following steps.
Sign in to Grafana and select Asserts > Asserts SLO.
Click Add SLO.
Click the Advanced tab.
Click Latency/Occurrence.
Enter an SLO name.
In the SLI Details section, enter a measurement query.
This query defines the service that you want to track.
In the RCA workbench section, enter a search expression.
The search expression defines which entities populate the RCA workbench when you troubleshoot an SLO.
This value automatically populates based on the job identified in the query. You can override this value by selecting another search expression.
Use the following table to complete the fields in the Target Objectives section.
Click Add new SLO.
The P99 latency calculates for each minute. For example, if the threshold is 100 milliseconds, each minute that the P99 falls within the threshold is a good minute and each minute when the P99 is above this threshold is a bad minute.
The following table summarizes the SLO in terms of tolerated bad minutes or expected good minutes in a day.
View and edit SLOs
You can edit an SLO at any time.
To view and edit an SLO, complete the following steps:
Sign in to Grafana and select Asserts > Asserts SLO.
You can see SLO performance on the SLO page.
To edit an SLO, click the edit icon next to the SLO.
The edit page opens where you can make changes.
Click Update SLO.