Introduction to the knowledge graph

The knowledge graph helps you troubleshoot application and infrastructure issues faster by automatically connecting the dots across your observability data. Instead of manually jumping between dashboards, writing complex queries, or sifting through noisy alerts, the knowledge graph surfaces the issues that matter and guides you to root causes.

Built directly into Grafana Cloud, the knowledge graph works with data you’re already collecting from Grafana Cloud Observability tools. Activation takes just a few clicks.

What problems does it solve?

Modern cloud environments are complex. A single user-facing issue might involve dozens of services, databases, and infrastructure components. Traditional monitoring approaches require you to:

Write queries to find problems hidden in your data
Manually correlate metrics, logs, and traces across multiple tools
Understand every service and dependency in your environment
Create and maintain hundreds of alert rules

The knowledge graph eliminates this toil by automatically discovering what’s running in your environment and continuously analyzing it for issues.

How does it work?

The knowledge graph automatically discovers entities in your environment, such as services, pods, nodes, and databases, and understands the relationships between them. It then continuously generates insights that highlight issues across these entities.

Insights: The connective tissue

Insights are the core of how the knowledge graph helps you troubleshoot. They’re automatically generated alerts that track different types of issues:

Saturation - Resources approaching limits (CPU at 90%, disk filling up)
Amend - Configuration changes that might cause issues (deployments, scaling events)
Anomaly - Unusual patterns in traffic or behavior (request rate spike, latency increase)
Failure - Critical system failures (Pod crashes, controller errors)
Error - Request failures and latency threshold breaches (500 errors, slow responses)

When you see a service with a red indicator in the entity catalog, that’s an insight telling you something’s wrong. Click into it, and you’ll see exactly which type of issue is occurring, when it started, and what signals support it.

Key features

The knowledge graph provides several features to help you monitor and troubleshoot:

Entity catalog

Your primary entry point for monitoring. The entity catalog shows all discovered entities with their health status, insights, and key metrics. Filter by entity type, environment, or insight category to quickly find what needs attention.

Used in these workflows:

RCA workbench

Add entities to RCA workbench to investigate incidents involving multiple components. See insights firing across different entities on a unified timeline, which allows you to spot which issues started together and identify root causes.

Used in these workflows:

Grafana Assistant integration

Grafana Assistant analyzes the entities and insights in your RCA workbench to suggest root cause hypotheses and recommended troubleshooting steps. Grafana Assistant is available across the entire Grafana platform.

Used in these workflows:

Investigate incidents across multiple entities with AI-assisted analysis

Entity graph

The entity graph provides a visual network view of relationships between your services and infrastructure. This is an optional accelerator for understanding dependencies. Most troubleshooting workflows start with the entity catalog.

Core concepts

The knowledge graph is built on entities, context-sensitive dashboards, and insights that work together to surface issues and guide troubleshooting.

Entities

Entities represent the components in your environment. The knowledge graph automatically discovers:

Services - Microservices instrumented with OpenTelemetry or other APM tools
Pods - Kubernetes workload units
Nodes - Kubernetes cluster nodes or virtual machines
Databases - MySQL, PostgreSQL, Redis, and other data stores
Infrastructure - Load balancers, clusters, namespaces

Each entity has properties (environment, version, cluster) and relationships to other entities (service depends on database).

Context-sensitive dashboards

When you click on an entity, you see dashboards tailored to that entity type. Services show RED metrics (Request rate, Error rate, Duration). Infrastructure shows CPU, memory, and disk usage. Logs, traces, and profiles are pre-filtered to that entity.

How insights and signals work together

Insights don’t replace your metrics, logs, and traces; they enhance them. An insight points you to a problem, then you drill into the underlying signals to understand why. For example:

Insight: “Latency threshold breach” on checkout-service
Click to view entity details
See latency spike on dashboard
Click the Traces tab to see which operations are slow
Identify bottleneck in database query

This workflow takes seconds instead of minutes.

What makes it different?

Unlike traditional APM or monitoring tools:

No setup tax - Automatically discovers entities from existing telemetry
Zero queries required - Entity catalog and insights surface issues automatically
Unified workflow - Metrics, logs, and traces in one place, pre-filtered by entity
Built into Grafana Cloud - Not a separate tool to learn

Next steps

Ready to get started? Activation takes just a few clicks:

Get started with the knowledge graph - Activate Kubernetes or Application Observability monitoring
Entity catalog overview - Learn about your primary entry point
Insights explained - Understand insight categories in depth