Introduction to the knowledge graph
The knowledge graph helps you troubleshoot application and infrastructure issues faster by automatically connecting the dots across your observability data. Instead of manually jumping between dashboards, writing complex queries, or sifting through noisy alerts, the knowledge graph surfaces the issues that matter and guides you to root causes.
Built directly into Grafana Cloud, the knowledge graph works with data you’re already collecting from Grafana Cloud Observability tools. Activation takes just a few clicks.
What problems does it solve?
Modern cloud environments are complex. A single user-facing issue might involve dozens of services, databases, and infrastructure components. Traditional monitoring approaches require you to:
- Write queries to find problems hidden in your data
- Manually correlate metrics, logs, and traces across multiple tools
- Understand every service and dependency in your environment
- Create and maintain hundreds of alert rules
The knowledge graph eliminates this toil by automatically discovering what’s running in your environment and continuously analyzing it for issues.
How does it work?
The knowledge graph automatically discovers entities in your environment, such as services, pods, nodes, and databases, and understands the relationships between them. It then continuously generates insights that highlight issues across these entities.
Insights: The connective tissue
Insights are the core of how the knowledge graph helps you troubleshoot. They’re automatically generated alerts that track different types of issues:
- Saturation - Resources approaching limits (CPU at 90%, disk filling up)
- Amend - Configuration changes that might cause issues (deployments, scaling events)
- Anomaly - Unusual patterns in traffic or behavior (request rate spike, latency increase)
- Failure - Critical system failures (Pod crashes, controller errors)
- Error - Request failures and latency threshold breaches (500 errors, slow responses)
When you see a service with a red indicator in the entity catalog, that’s an insight telling you something’s wrong. Click into it, and you’ll see exactly which type of issue is occurring, when it started, and what signals support it.
Key features
The knowledge graph provides several features to help you monitor and troubleshoot:
Entity catalog
Your primary entry point for monitoring. The entity catalog shows all discovered entities with their health status, insights, and key metrics. Filter by entity type, environment, or insight category to quickly find what needs attention.
Used in these workflows:
RCA workbench
Add entities to RCA workbench to investigate incidents involving multiple components. See insights firing across different entities on a unified timeline, which allows you to spot which issues started together and identify root causes.
Used in these workflows:
Grafana Assistant integration
Grafana Assistant analyzes the entities and insights in your RCA workbench to suggest root cause hypotheses and recommended troubleshooting steps. Grafana Assistant is available across the entire Grafana platform.
Used in these workflows:
- Investigate incidents across multiple entities with AI-assisted analysis
Entity graph
The entity graph provides a visual network view of relationships between your services and infrastructure. This is an optional accelerator for understanding dependencies. Most troubleshooting workflows start with the entity catalog.
Core concepts
The knowledge graph is built on entities, context-sensitive dashboards, and insights that work together to surface issues and guide troubleshooting.
Entities
Entities represent the components in your environment. The knowledge graph automatically discovers:
- Services - Microservices instrumented with OpenTelemetry or other APM tools
- Pods - Kubernetes workload units
- Nodes - Kubernetes cluster nodes or virtual machines
- Databases - MySQL, PostgreSQL, Redis, and other data stores
- Infrastructure - Load balancers, clusters, namespaces
Each entity has properties (environment, version, cluster) and relationships to other entities (service depends on database).
Context-sensitive dashboards
When you click on an entity, you see dashboards tailored to that entity type. Services show RED metrics (Request rate, Error rate, Duration). Infrastructure shows CPU, memory, and disk usage. Logs, traces, and profiles are pre-filtered to that entity.
How insights and signals work together
Insights don’t replace your metrics, logs, and traces; they enhance them. An insight points you to a problem, then you drill into the underlying signals to understand why. For example:
- Insight: “Latency threshold breach” on
checkout-service - Click to view entity details
- See latency spike on dashboard
- Click the Traces tab to see which operations are slow
- Identify bottleneck in database query
This workflow takes seconds instead of minutes.
What makes it different?
Unlike traditional APM or monitoring tools:
- No setup tax - Automatically discovers entities from existing telemetry
- Zero queries required - Entity catalog and insights surface issues automatically
- Unified workflow - Metrics, logs, and traces in one place, pre-filtered by entity
- Built into Grafana Cloud - Not a separate tool to learn
Next steps
Ready to get started? Activation takes just a few clicks:
- Get started with the knowledge graph - Activate Kubernetes or Application Observability monitoring
- Entity catalog overview - Learn about your primary entry point
- Insights explained - Understand insight categories in depth



