Grafana Cloud

Troubleshoot data issues

This topic provides guidance for troubleshooting data issues in the knowledge graph.

Required metrics and labels

If the knowledge graph isn’t discovering entities or if you’re experiencing empty panels in your dashboards, it may be because Grafana Cloud Adaptive Metrics is dropping or aggregating metrics or labels that the knowledge graph needs. If Adaptive Metrics is affecting the required metrics, you need to remove them from Adaptive Metrics. To learn how to remove metrics from Adaptive Metrics, refer to Recommendation exemptions.

Application Observability required metrics and labels

For an overview of the metrics and labels necessary for the knowledge graph to monitor your environment when using Application Observability, refer to Application Observability required metrics and labels. If the labels are present but issues persist, open a support ticket for further assistance.

For more information on how to send traces_host_info, refer to Host-hours pricing.

Kubernetes metrics

The table below shows the metrics and labels necessary for the knowledge graph to monitor your Kubernetes environment. If the labels are present but issues persist, open a support ticket for further assistance.

Metric nameRequired labels
kube_pod_infocluster, namespace, node, pod
kube_pod_ownercluster, namespace, node, owner_kind, owner_name
kube_pod_container_resource_requestscluster, namespace, pod, container, resource
kube_pod_status_phasecluster, namespace, pod, phase
kube_replicaset_ownercluster, namespace, replicaset, owner_name, owner_kind
kube_pod_container_infocluster, namespace, container, image_id
kube_pod_container_resource_limitscluster, namespace, pod, container, resource
kube_configmap_metadata_resource_versioncluster, namespace, configmap
kube_secret_metadata_resource_versioncluster, namespace, secret
kube_deployment_metadata_generationcluster, namespace, deploymentstatefulsetdaemonset
kube_node_infocluster, node
kubelet_node_namecluster, node, instance
AWS
kube_node_labelslabel_beta_kubernetes_io_instance_type, and label_eks_amazonaws_com_nodegroup or
label_karpenter_sh_nodepool or
label_alpha_eksctl_io_cluster_name, label_alpha_eksctl_io_nodegroup_name or
label_ec2_amazonaws_com_Name, label_ec2_amazonaws_com_aws_autoscaling_groupName or
label_ec2_amazonaws_com_name, label_ec2_amazonaws_com_aws_autoscaling_group_name or
label_k8s_io_cloud_provider_aws
GCP
kube_node_labelslabel_node_kubernetes_io_instance_type, label_cluster_name, label_cloud_google_com_gke_nodepool
Azure
kube_node_labelslabel_agentpool, label_kubernetes_azure_com_cluster
kube_node_status_allocatablecluster, node, resource

Container resource utilization observability

The following table lists metrics and labels required for Kubernetes container resource utilization observability.

Metric nameRequired labels
container_cpu_cfs_throttled_periods_totalcluster, namespace, pod, container, node
container_cpu_cfs_periods_totalcluster, namespace, pod, container, node
container_memory_working_set_bytescluster, namespace, pod, container, node
container_memory_usage_bytescluster, namespace, pod, container, node
container_memory_cachecluster, namespace, pod, container, node

RED metrics troubleshooting

For the knowledge graph to associate the RED metrics with the Kubernetes entities it identifies, the entities must have labels that specify their source. For instance, span metrics require labels such as k8s.namespace.name, k8s.cluster.name, and k8s.pod.name.

You can use the Kubernetes Attributes Process to assign these labels. Make sure you follow the Kubernetes monitoring recommendations.

If you still encounter problems, submit a support ticket for further assistance.

Prometheus troubleshooting

In addition to using Grafana Cloud Application Observability or Grafana Cloud Kubernetes Monitoring, you might use Prometheus to scrape some metrics. However, there are some guidelines to consider for the knowledge graph to work correctly.

If you use a single Prometheus job to scrape multiple entities, it can create the following issues:

  • The knowledge graph might not be able to detect all your entities.
  • RED metrics might not get associated to entities.
  • RED metrics might get aggregated across workloads that share the same job.

To avoid issues, we recommend the following:

  • Make the entities easily identifiable. You can do this by applying one of the following methods:
    • Try not to use a single job to scrape multiple services and instead use a job per service.
    • Identify your entities by adding a service label to your metrics.
  • If you are using annotation-based Kubernetes service discovery in your Prometheus configuration, you can use the following relabeling rules:
YAML
source_labels: [__meta_kubernetes_pod_name]
regex: ^(.*?)([-][a-zA-Z0-9]{5,10}(-[a-zA-Z0-9]{5})?|-[0-9]+)?$
target_label: service
replacement: $1

AWS troubleshooting

The following sections list the metrics and labels necessary for the knowledge graph to discover Amazon Web Services (AWS) entities and build relationships.

AWS Cloud Provider Observability entity discovery

The following table lists the metrics and labels necessary for the knowledge graph to discover Amazon Web Services (AWS) entities from AWS Cloud Provider Observability. All AWS entities require the asserts_env and asserts_site labels for scoping.

Amazon Web Services (AWS) Cloud Provider Observability metrics carry the Amazon Resource Name in the name label. The knowledge graph extracts the entity name from the Amazon Resource Name using pattern matching.

Entity typeMetric nameRequired labelsName derivation
Amazon EC2 Instanceaws_ec2_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name last segment
Amazon RDS Instanceaws_rds_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name via pattern ^.+:db:(.+)$
AWS Application Load Balanceraws_applicationelb_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name path
AWS Network Load Balanceraws_networkelb_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name path
AWS Lambda functionaws_lambda_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name via pattern ^.+:(.+)$
Amazon Simple Storage Serviceaws_s3_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name via pattern arn:aws:s3:::(.+)$
Amazon Simple Queue Serviceaws_sqs_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name via pattern ^.+:(.+)$
Amazon DynamoDB tableaws_dynamodb_infoasserts_env, asserts_site, nameExtracted from Amazon Resource Name path
Amazon ECS Serviceaws_ecs_*asserts_env, asserts_site, namespace, serviceDirect from service label
Amazon API Gatewayaws_apigateway_*asserts_env, asserts_site, namespace, serviceDirect from service label

Amazon RDS relationships

The following table lists the metrics and labels necessary for the knowledge graph to build Amazon RDS entities and relationships. These metrics are generated from span metrics sources and help identify relationships to Amazon RDS instances by matching *.rds.amazonaws.com hostname patterns in the required labels.

Metric nameRequired labels
traces_service_graph_request_client_seconds_countclient_server_address
traces_service_graph_request_client_seconds_countserver
traces_span_metrics_calls_totalserver_address
traces_span_metrics_calls_totalnet_peer_name
traces_spanmetrics_calls_totalnet_peer_name

Azure troubleshooting

The following table lists the metrics and labels necessary for the knowledge graph to discover Azure entities from Cloud Provider Observability. All Azure entities require the asserts_env and asserts_site labels for scoping.

Azure metrics carry the resourceName label natively. The knowledge graph uses this label directly without derivation.

Entity typeMetric nameRequired labelsName derivation
Azure Virtual Machineazure_microsoft_compute_virtualmachines_vmavailabilitymetric_average_countasserts_env, asserts_site, resourceNameDirect from resourceName
Azure Flexible Serverazure_microsoft_dbforpostgresql_flexibleservers_active_connections_average_countasserts_env, asserts_site, resourceNameDirect from resourceName
Azure Flexible Serverazure_microsoft_dbformysql_flexibleservers_active_connections_average_countasserts_env, asserts_site, resourceNameDirect from resourceName
Azure Blob Storageazure_microsoft_storage_storageaccounts_blobservices_blobcount_average_countasserts_env, asserts_site, resourceNameDirect from resourceName

Google Cloud troubleshooting

The following table lists the metrics and labels necessary for the knowledge graph to discover GCP entities from Cloud Provider Observability. All GCP entities require the asserts_env and asserts_site labels for scoping.

Stackdriver metrics carry the name natively in the instance_name or database_id label. For Cloud SQL, the knowledge graph extracts the instance name from database_id using pattern matching.

Entity typeMetric nameRequired labelsName derivation
Compute Engine Instancestackdriver_gce_instance_compute_googleapis_com_instance_cpu_utilizationasserts_env, asserts_site, instance_nameDirect from instance_name
Cloud SQL Instancestackdriver_cloudsql_database_cloudsql_googleapis_com_database_upasserts_env, asserts_site, database_idExtracted from database_id via pattern .+:(.+)$