Menu
Grafana Cloud

Manage thresholds

Asserts includes predefined alerts that fire when they breach a specified threshold. A threshold is a value that you set for an assertion, and when a value reaches or surpasses a threshold, it triggers the assertion to fire.

This topic explains the different types of asserts thresholds, how to edit them, and how to create request, resource, and health thresholds.

Understanding thresholds

Asserts divides threshold configurations into request, resource, and health thresholds and organizes each group by assertion types.

Request and resource thresholds

Request and resource predefined thresholds are general and apply to all components. For example, the saturation alert for resources applies to disks, CPU, and memory. The default CPU threshold set by Asserts is 60% for Warning and 80% for Critical. While most predefined thresholds might meet your business needs, there could be times when you want to adjust a request or resource threshold so that an assertion fires more or less frequently.

For example, the asserts:error:ratio:threshold_by_stddev threshold of the ErrorRatioAnomaly assertion is 2. This means that an error ratio anomaly assertion fires when it’s two standard deviations above the normalized mean.

However, if you determine that two standard deviations is never breached and instead want the assertion to fire when the standard deviation is 1.5, you can update the predefined threshold. You can also define your own threshold if the predefined thresholds don’t meet your needs.

If you need a different threshold for a specific workload, you can create a rule for that workload instead of modifying the threshold for all resources.

Request thresholds

Request thresholds include rate, latency, and error assertions.

  • Asserts checks each Anomaly against a dynamic range that combines standard deviation and percentage change. Daily and weekly seasonal differences are considered. A sparseness check reduces noise on sparse requests.
  • Each Breach is checked against a static threshold. ErrorLogSpike is treated as a breach assertion.
  • Client Errors are treated as anomalies, and so follow the anomaly algorithms.
  • Server Errors are tracked with an error budget approach, so they’re controlled by fast-burn or slow-burn factors.

Resource thresholds

Resource thresholds include CPU, memory, disk, and network assertions.

  • Each Saturation works with two static thresholds, one for warning and one for critical.
  • Some resources like disk have rate metrics (bytes read/write), so there are ResourceRateAnomaly and ResourceRateBreach assertions. They follow the same approach as their request counterparts.

Example

The following example shows a latency average breach assertion that fires when an inbound request from the /cities/{code} API to the shipping service exceeds 2 seconds.

Latency average breach assertion

Health thresholds

Asserts includes a library of health metric thresholds grouped by vendor. When you bring your data into Grafana Cloud, Asserts understands which Prometheus metrics your system captures, and the corresponding health thresholds are made available.

For example, if you use Redis, after you bring your data into Grafana Cloud, you see the following health metrics thresholds on the Health tab of the Threshold page.

Example Redis health thresholds

You can click the chevron to view the threshold expression and value determined by Grafana Labs. For system status (up or down), the threshold values are clear - 0 for down and 1 for up. In other cases, Grafana Labs uses domain knowledge to determine the appropriate threshold values.

Because health rules are more specific and target a narrower scope compared to request or resource thresholds, modifying them alters the query and essentially creates a custom rule. While it’s possible to edit health threshold values, this shouldn’t be frequently necessary.

The following example shows the expression and value of the RedisMasterLinkDown threshold.

Expression and value of the RedisMasterLinkDown threshold

When to create a threshold

While Asserts provides many predefined thresholds and values, there are cases when you might need to define your own thresholds.

  • Consider defining a request or resource threshold when you want the threshold to apply to a specific workload. Changes to a predefined request or resource threshold apply to all requests or resources.
  • Consider defining a health threshold when you send a custom metric for which there isn’t a predefined threshold available. You define a health threshold on the Add new rule page.

Edit and create thresholds

This section show you how to edit and create all types of thresholds.

Edit a threshold value

If a predefined threshold value doesn’t meet your needs, you can change it. The following steps apply to request, resource, and health thresholds.

To edit a threshold value, complete the following steps.

  1. Sign in to Grafana and select Asserts > Rules.

  2. Locate the threshold you want to modify and click the pencil icon.

  3. Update the value and click the checkmark.

Define a request threshold

In addition to configuring predefined threshold values, you can create and configure your own thresholds. Thresholds are hierarchical. For requests, if you specify a threshold on job, it applies to all the request types for that job. Similarly, if you specify a threshold on request_type, it applies to all the request contexts for that request type and job.

To define a request threshold, complete the following steps:

  1. On the Threshold page, click Request.

  2. Complete the following fields:

    Field nameDescription
    JobSelect a job. The list of jobs is generated from your environment.
    AssertionSelect the type of assertion threshold you are creating.
    Request TypeOptional. The nature of the request. For example: inbound/outbound requests, database queries, gRPC calls, and so on.
    Request ContextOptional. Details about the request. For example: the API path, the operation being performed, the method called, and so on.
    ValueEnter a threshold value. You can use the value picker to determine a threshold value using historical data.
  3. Click Add new.

Define a resource threshold

If the default resource thresholds don’t meet your needs, you can create your own. For example, define a resource threshold when you want a specific container’s CPU to fire an assertion if it reaches 40% instead of the default 60% saturation.

The resource threshold hierarchy begins with source, for example, the exporter, followed by resource_type, and then container. An additional dimension is severity, which is independent of the hierarchy.

To define a resource threshold, complete the following steps:

  1. On the Threshold page, click Resource.

  2. Complete the following fields:

    Field nameDescription
    AssertionSelect the type of assertion threshold you are creating.
    Resource TypeThe resource and what specifically Asserts is measuring. For example: cpu:load, disk:usage, memory:page_faults, and so on.
    ContainerOptional. The name of the container using the resource.
    SourceThe source of the metrics for the resource. This could be any exporter or framework serving up resource metrics.
    SeverityThe severity of the threshold (warning or critical).
    ValueEnter a threshold value. You can use the value picker to determine a threshold value using historical data.
  3. Click Add new.

Define a health threshold

Consider defining a health threshold when you send a custom metric for which there isn’t a predefined threshold available. Use the New rule tab to define a single rule; use the New rule file tab to define multiple rules.

Before you begin

Before you begin to define a health threshold, ensure that:

  • You are familiar with Prometheus metrics, alerts, and functions
  • You are familiar with PromQL
  • You are familiar with the custom metrics for which you might want to create a threshold

Steps

To define a health threshold, complete the following steps.

  1. Sign in to Grafana Cloud and navigate to Asserts > Rules > Bring Your Own.

  2. To define a health threshold using the user interface, complete the following steps:

    1. Click the New rule tab.

    2. Complete the following fields:

      Field nameDescription
      NameAdd a name for the rule you are creating.
      QueryEnter the query.
      ForDuration for which the alert condition must be true before an alert is fired.
      CategorySelect the type of assertion you want triggered when the threshold is breached.
      SeveritySelect the severity associated with the assertion.
      Entity TypeSelect the type of entity for which this threshold is associated.
    3. Click Add new.

  3. To define a health threshold with a file, complete the following steps:

    1. Click the New rule file tab.

    2. In the View/Edit YML file field, enter (or copy and paste) the contents of the rule.

    3. Click Save.