How the knowledge graph processes data

This topic describes what happens after you connect your observability data stores such as Prometheus and CloudWatch to the knowledge graph.

Discovery

The knowledge graph inspects labels to identify entities and populate their properties. It also establishes relationships between these entities by comparing their properties or using specified metrics to establish direct connections. This allows the knowledge graph to determine which Pod is hosted on which node, which Pods form a Service, and how services interact with each other.

The knowledge graph creates an entity graph that encompasses all entities, properties, and relationships, that represents a comprehensive understanding of the system. This graph is indexed, making it convenient for searching. The discovery process continually updates the graph, while maintaining a record of its historical changes.

Normalization

In the next phase, the knowledge graph uses a curated collection of rules to normalize the incoming heterogeneous time series data. This process converts the data into a cohesive set of essential metrics, such as Request, Error, Duration (RED) metrics for application components, and utilization metrics for infrastructure components.

For example, the knowledge graph records the RED metrics from Spring Boot as Prometheus counter asserts:request:total, asserts:latency:total, and asserts:error:total.

- record: asserts:request:total
  expr: http_server_requests_seconds_count
  labels:
    asserts_request_type: inbound
    asserts_source: spring_boot

- record: asserts:latency:total
  expr: http_server_requests_seconds_sum
  labels:
    asserts_request_type: inbound
    asserts_source: spring_boot

- record: asserts:error:total
  expr: http_client_requests_seconds_count{status=~"5.."}
  labels:
    asserts_request_type: outbound
    asserts_error_type: server_errors
    asserts_source: spring_boot

The knowledge graph adds labels such as asserts_request_type and asserts_error_type, to indicate the level of granularity for further processing in instrumentation.

To capture additional dynamic and contextual information, such as HTTP paths, the knowledge graph applies a Prometheus relabeling rule during the data ingestion process. This information is then stored in asserts_request_context.

If you have different environments (development, stage, and production) with each having one or more sites, you can use external labels or relabelling rules to add asserts_env and asserts_site labels to scope metrics and entities discovered from them.

Insights

The knowledge graph applies its extensive domain knowledge to instrument these normalized metrics. The knowledge graph automatically instruments application frameworks like Spring Boot, Flask, and Loopback, infrastructure components like Kubernetes resources, and third-party services like Redis server, Kafka clusters, and many more.

With instrumentation in place, the knowledge graph capture events as insights.

Saturation indicates whether a resource (CPU, Memory, etc) is saturated
Amend captures changes in the system, like deployment, scaling, and configuration map changes
Anomaly captures abnormal shifts in request rate, latency, or resource consumption
Failure records failure state in the system, like primary-standby sync failures and Pod crash looping
Error records problematic requests, for example, 500x and 400x, or breaches of latency thresholds

Insights are condensed time-series data that specifically capture significant events within the system. These events are considered non-trivial and provide valuable insights into the observability of different components within a modern application, showcasing the comprehensiveness of the knowledge graph.

Insights serve a distinct purpose compared to traditional alerts, as they are not designed to notify on-call personnel. Instead, they act as automated vital signs provided by the knowledge graph, readily available for troubleshooting purposes. However, you can subscribe to specific the insights and use them as traditional alerts if desired.

For more information, refer to Insights categories.

Correlation

The knowledge graph story doesn’t end with automatic instrumentation. When insights arise, the knowledge graph:

Attaches them back to the graph and indexes them for search. This way, a single graph search phrase can become a powerful way to navigate both entities and their health status.
Enriches the insights with contextual information from the graph. For example, an insight raised on a Pod is tagged back to the node and service the Pod belongs to. This way, insights that happened on ephemeral entities (for example, Pods) can bubble up to long-lived entities (for example, nodes and services), thus forming an aggregated view with a continuous timeline.

Because the knowledge graph condenses and contextualizes the insights, they are much faster to query and aggregate, much easier to correlate or rank, thus enabling quick and precise root cause analysis.