About Asserts

Asserts is a next-generation technology that provides valuable insights into your distributed, multi-cloud, hybrid applications. By using Asserts, your team can eliminate the frustration of using disjointed dashboards that fail to keep up with frequent updates. Your engineers no longer need to spend time deciphering visualizations to find crucial information.

Your on-call team will no longer be overwhelmed by irrelevant alerts that are difficult to manage, noisy, and become quickly outdated.

This section describes the benefits of Asserts.

Discover a living map of application and infrastructure components

Asserts collects information from your telemetry data sources and uses it to create a visual representation of your application and infrastructure components. It then organizes and indexes this representation, making it easy to search for specific information to determine how the components fit together in real-time.

The following Asserts Entity Graph shows the relationships between and among application and infrastructure components.

Asserts curates knowledge of common runtime failure patterns and potential causes, so your team doesn’t have to research and maintain these rules.

Asserts continuously tracks resource Saturation, Amends (for example, deployments and scale events), request and latency Anomalies, systemic Failures, and Errors on your golden signals and health metrics.
The entity graph annotates occurrences of these assertions making it easy for you to understand and use them.

Explore with unified search

With unified search, you can combine components, relationships, configurations, and associated assertions to express your intent in a clear and simple natural language expression.

For example, this advanced query returns all Pods with assertions and their connected Nodes and all services and their connected Pods where the service name contains mysql.

Furthermore, you can use the search expression in the RCA Workbench, which enables you to instantly view all the assertions correlated across time and space. This gives you quick access to the relevant data you need.

Curated rules detect service unavailability and potential causes

Asserts actively manages and organizes information on common runtime failure patterns and their potential causes. This means your team doesn’t have to spend time researching and maintaining complex PromQL recording and alerting rules specifically for different frameworks.

Asserts continuously tracks resource:

Saturation - Asserts monitors software objects like client connections that come with built-in limits. When their usage is close to their limits, a saturation assertion occurs.
Amends - Asserts captures changes to your environment. Example amend assertions include container deployments, configuration updates, and HPA scale events.
Anomalies - Asserts detects pattern changes related to traffic. Example anomaly assertions include request rate, error rate, and latency.
Failures - Asserts detects significant or complete application degradation. Example failure assertions include Pod crash looping and CronJob failures.
Errors - Asserts monitors erroneous events in the system about how the software handles real-world traffic. Example error assertions include 5XX/4XX status codes and a latency threshold breach on your golden signals and health metrics.

The entity graph annotates occurrences of these assertions making it easy for you to understand and use them. For more information about the SAAFE model, refer to Understanding the SAAFE model.

Reduce mean time to resolution

Because Asserts is always checking for assertions, you don’t have to wait for SLOs to breach and alerts to fire before knowing you should act. You can identify issues quickly using the Asserts Top Insights dashboards. Top Insights presents a stack-ranked view of services and nodes that need attention based on their severity score. You can then quickly navigate to the RCA Workbench to perform root cause analysis. For more information about Top Insights, refer to Top Insights.

Perform root cause analysis in RCA Workbench

In RCA Workbench, you can explore all potential causes for a particular issue correlated over time and dependency. You also have access to the relevant metrics, logs, and traces.

In the following example, a deployment amend on the shipping service triggered a spike in error rate on the service and a p99 latency spike on the /cities/{code} endpoint.