Troubleshoot data issues

This topic provides guidance for troubleshooting data issues in the knowledge graph.

Required metrics and labels

If the knowledge graph isn’t discovering entities or if you’re experiencing empty panels in your dashboards, it may be because Grafana Cloud Adaptive Metrics is dropping or aggregating metrics or labels that the knowledge graph needs. If Adaptive Metrics is affecting the required metrics, you need to remove them from Adaptive Metrics. To learn how to remove metrics from Adaptive Metrics, refer to Recommendation exemptions.

Application Observability required metrics and labels

For an overview of the metrics and labels necessary for the knowledge graph to monitor your environment when using Application Observability, refer to Application Observability required metrics and labels. If the labels are present but issues persist, open a support ticket for further assistance.

For more information on how to send traces_host_info, refer to Host-hours pricing.

Kubernetes metrics

The table below shows the metrics and labels necessary for the knowledge graph to monitor your Kubernetes environment. If the labels are present but issues persist, open a support ticket for further assistance.

Metric name	Required labels
kube_pod_info	cluster, namespace, node, pod
kube_pod_owner	cluster, namespace, node, owner_kind, owner_name
kube_pod_container_resource_requests	cluster, namespace, pod, container, resource
kube_pod_status_phase	cluster, namespace, pod, phase
kube_replicaset_owner	cluster, namespace, replicaset, owner_name, owner_kind
kube_pod_container_info	cluster, namespace, container, image_id
kube_pod_container_resource_limits	cluster, namespace, pod, container, resource
kube_configmap_metadata_resource_version	cluster, namespace, configmap
kube_secret_metadata_resource_version	cluster, namespace, secret
kube_deployment_metadata_generation	cluster, namespace, deployment	statefulset	daemonset
kube_node_info	cluster, node
kubelet_node_name	cluster, node, instance
AWS
kube_node_labels	label_beta_kubernetes_io_instance_type, and label_eks_amazonaws_com_nodegroup or
	label_karpenter_sh_nodepool or
	label_alpha_eksctl_io_cluster_name, label_alpha_eksctl_io_nodegroup_name or
	label_ec2_amazonaws_com_Name, label_ec2_amazonaws_com_aws_autoscaling_groupName or
	label_ec2_amazonaws_com_name, label_ec2_amazonaws_com_aws_autoscaling_group_name or
	label_k8s_io_cloud_provider_aws
GCP
kube_node_labels	label_node_kubernetes_io_instance_type, label_cluster_name, label_cloud_google_com_gke_nodepool
Azure
kube_node_labels	label_agentpool, label_kubernetes_azure_com_cluster
kube_node_status_allocatable	cluster, node, resource

Container resource utilization observability

The following table lists metrics and labels required for Kubernetes container resource utilization observability.

Metric name	Required labels
container_cpu_cfs_throttled_periods_total	cluster, namespace, pod, container, node
container_cpu_cfs_periods_total	cluster, namespace, pod, container, node
container_memory_working_set_bytes	cluster, namespace, pod, container, node
container_memory_usage_bytes	cluster, namespace, pod, container, node
container_memory_cache	cluster, namespace, pod, container, node

RED metrics troubleshooting

For the knowledge graph to associate the RED metrics with the Kubernetes entities it identifies, the entities must have labels that specify their source. For instance, span metrics require labels such as k8s.namespace.name, k8s.cluster.name, and k8s.pod.name.

You can use the Kubernetes Attributes Process to assign these labels. Make sure you follow the Kubernetes monitoring recommendations.

If you still encounter problems, submit a support ticket for further assistance.

Prometheus troubleshooting

In addition to using Grafana Cloud Application Observability or Grafana Cloud Kubernetes Monitoring, you might use Prometheus to scrape some metrics. However, there are some guidelines to consider for the knowledge graph to work correctly.

If you use a single Prometheus job to scrape multiple entities, it can create the following issues:

The knowledge graph might not be able to detect all your entities.
RED metrics might not get associated to entities.
RED metrics might get aggregated across workloads that share the same job.

To avoid issues, we recommend the following:

Make the entities easily identifiable. You can do this by applying one of the following methods:
- Try not to use a single job to scrape multiple services and instead use a job per service.
- Identify your entities by adding a service label to your metrics.
If you are using annotation-based Kubernetes service discovery in your Prometheus configuration, you can use the following relabeling rules:

source_labels: [__meta_kubernetes_pod_name]
regex: ^(.*?)([-][a-zA-Z0-9]{5,10}(-[a-zA-Z0-9]{5})?|-[0-9]+)?$
target_label: service
replacement: $1

AWS troubleshooting

The following sections list the metrics and labels necessary for the knowledge graph to discover Amazon Web Services (AWS) entities and build relationships.

AWS Cloud Provider Observability entity discovery

The following table lists the metrics and labels necessary for the knowledge graph to discover Amazon Web Services (AWS) entities from AWS Cloud Provider Observability. All AWS entities require the asserts_env and asserts_site labels for scoping.

Amazon Web Services (AWS) Cloud Provider Observability metrics carry the Amazon Resource Name in the name label. The knowledge graph extracts the entity name from the Amazon Resource Name using pattern matching.

Entity type	Metric name	Required labels	Name derivation
Amazon EC2 Instance	aws_ec2_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name last segment
Amazon RDS Instance	aws_rds_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name via pattern `^.+:db:(.+)$`
AWS Application Load Balancer	aws_applicationelb_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name path
AWS Network Load Balancer	aws_networkelb_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name path
AWS Lambda function	aws_lambda_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name via pattern `^.+:(.+)$`
Amazon Simple Storage Service	aws_s3_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name via pattern `arn:aws:s3:::(.+)$`
Amazon Simple Queue Service	aws_sqs_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name via pattern `^.+:(.+)$`
Amazon DynamoDB table	aws_dynamodb_info	asserts_env, asserts_site, name	Extracted from Amazon Resource Name path
Amazon ECS Service	aws_ecs_*	asserts_env, asserts_site, namespace, service	Direct from `service` label
Amazon API Gateway	aws_apigateway_*	asserts_env, asserts_site, namespace, service	Direct from `service` label

Amazon RDS relationships

The following table lists the metrics and labels necessary for the knowledge graph to build Amazon RDS entities and relationships. These metrics are generated from span metrics sources and help identify relationships to Amazon RDS instances by matching *.rds.amazonaws.com hostname patterns in the required labels.

Metric name	Required labels
traces_service_graph_request_client_seconds_count	client_server_address
traces_service_graph_request_client_seconds_count	server
traces_span_metrics_calls_total	server_address
traces_span_metrics_calls_total	net_peer_name
traces_spanmetrics_calls_total	net_peer_name

Azure troubleshooting

The following table lists the metrics and labels necessary for the knowledge graph to discover Azure entities from Cloud Provider Observability. All Azure entities require the asserts_env and asserts_site labels for scoping.

Azure metrics carry the resourceName label natively. The knowledge graph uses this label directly without derivation.

Entity type	Metric name	Required labels	Name derivation
Azure Virtual Machine	azure_microsoft_compute_virtualmachines_vmavailabilitymetric_average_count	asserts_env, asserts_site, resourceName	Direct from resourceName
Azure Flexible Server	azure_microsoft_dbforpostgresql_flexibleservers_active_connections_average_count	asserts_env, asserts_site, resourceName	Direct from resourceName
Azure Flexible Server	azure_microsoft_dbformysql_flexibleservers_active_connections_average_count	asserts_env, asserts_site, resourceName	Direct from resourceName
Azure Blob Storage	azure_microsoft_storage_storageaccounts_blobservices_blobcount_average_count	asserts_env, asserts_site, resourceName	Direct from resourceName

Google Cloud troubleshooting

The following table lists the metrics and labels necessary for the knowledge graph to discover GCP entities from Cloud Provider Observability. All GCP entities require the asserts_env and asserts_site labels for scoping.

Stackdriver metrics carry the name natively in the instance_name or database_id label. For Cloud SQL, the knowledge graph extracts the instance name from database_id using pattern matching.

Entity type	Metric name	Required labels	Name derivation
Compute Engine Instance	stackdriver_gce_instance_compute_googleapis_com_instance_cpu_utilization	asserts_env, asserts_site, instance_name	Direct from instance_name
Cloud SQL Instance	stackdriver_cloudsql_database_cloudsql_googleapis_com_database_up	asserts_env, asserts_site, database_id	Extracted from database_id via pattern `.+:(.+)$`