Grafana Cloud

Troubleshooting

This topic provides guidance for troubleshooting data issues in the knowledge graph.

Required metrics and labels

If the knowledge graph isn’t discovering entities or if you’re experiencing empty panels in your dashboards, it may be because Grafana Cloud Adaptive Metrics is dropping or aggregating metrics or labels that the knowledge graph needs. If Adaptive Metrics is affecting the required metrics, you need to remove them from Adaptive Metrics. To learn how to remove metrics from Adaptive Metrics, refer to Recommendation exemptions.

Application Observability required metrics and labels

For an overview of the metrics and labels necessary for the knowledge graph to monitor your environment when using Application Observability, refer to Application Observability required metrics and labels. If the labels are present but issues persist, open a support ticket for further assistance.

For more information on how to send traces_host_info, refer to Host-hours pricing.

Kubernetes metrics

The table below shows the metrics and labels necessary for the knowledge graph to monitor your Kubernetes environment. If the labels are present but issues persist, open a support ticket for further assistance.

Metric nameRequired labels
kube_pod_infocluster, namespace, node, pod
kube_pod_ownercluster, namespace, node, owner_kind, owner_name
kube_pod_container_resource_requestscluster, namespace, pod, container, resource
kube_pod_status_phasecluster, namespace, pod, phase
kube_replicaset_ownercluster, namespace, replicaset, owner_name, owner_kind
kube_pod_container_infocluster, namespace, container, image_id
kube_pod_container_resource_limitscluster, namespace, pod, container, resource
kube_configmap_metadata_resource_versioncluster, namespace, configmap
kube_secret_metadata_resource_versioncluster, namespace, secret
kube_deployment_metadata_generationcluster, namespace, deploymentstatefulsetdaemonset
kube_node_infocluster, node
kubelet_node_namecluster, node, instance
AWS
kube_node_labelslabel_beta_kubernetes_io_instance_type, and label_eks_amazonaws_com_nodegroup or
label_karpenter_sh_nodepool or
label_alpha_eksctl_io_cluster_name, label_alpha_eksctl_io_nodegroup_name or
label_ec2_amazonaws_com_Name, label_ec2_amazonaws_com_aws_autoscaling_groupName or
label_ec2_amazonaws_com_name, label_ec2_amazonaws_com_aws_autoscaling_group_name or
label_k8s_io_cloud_provider_aws
GCP
kube_node_labelslabel_node_kubernetes_io_instance_type, label_cluster_name, label_cloud_google_com_gke_nodepool
Azure
kube_node_labelslabel_agentpool, label_kubernetes_azure_com_cluster
kube_node_status_allocatablecluster, node, resource

Container resource utilization observability

The following table lists metrics and labels required for Kubernetes container resource utilization observability.

Metric nameRequired labels
container_cpu_cfs_throttled_periods_totalcluster, namespace, pod, container, node
container_cpu_cfs_periods_totalcluster, namespace, pod, container, node
container_memory_working_set_bytescluster, namespace, pod, container, node
container_memory_usage_bytescluster, namespace, pod, container, node
container_memory_cachecluster, namespace, pod, container, node

RED metrics troubleshooting

For the knowledge graph to associate the RED metrics with the Kubernetes entities it identifies, the entities must have labels that specify their source. For instance, span metrics require labels such as k8s.namespace.name, k8s.cluster.name, and k8s.pod.name.

You can use the Kubernetes Attributes Process to assign these labels. Make sure you follow the Kubernetes monitoring recommendations.

If you still encounter problems, submit a support ticket for further assistance.

Prometheus troubleshooting

In addition to using Grafana Cloud Application Observability or Grafana Cloud Kubernetes Monitoring, you might use Prometheus to scrape some metrics. However, there are some guidelines to consider for the knowledge graph to work correctly.

If you use a single Prometheus job to scrape multiple entities, it can create the following issues:

  • The knowledge graph might not be able to detect all your entities.
  • RED metrics might not get associated to entities.
  • RED metrics might get aggregated across workloads that share the same job.

To avoid issues, we recommend the following:

  • Make the entities easily identifiable. You can do this by applying one of the following methods:
    • Try not to use a single job to scrape multiple services and instead use a job per service.
    • Identify your entities by adding a service label to your metrics.
  • If you are using annotation-based Kubernetes service discovery in your Prometheus configuration, you can use the following relabeling rules:
YAML
source_labels: [__meta_kubernetes_pod_name]
regex: ^(.*?)([-][a-zA-Z0-9]{5,10}(-[a-zA-Z0-9]{5})?|-[0-9]+)?$
target_label: service
replacement: $1

AWS troubleshooting

The following table lists the metrics and labels necessary for the knowledge graph to build AWS RDS entities and relationships. These metrics are generated from span metrics sources and help identify relationships to AWS RDS instances by matching *.rds.amazonaws.com hostname patterns in the required labels.

Metric nameRequired labels
traces_service_graph_request_client_seconds_countclient_server_address
traces_service_graph_request_client_seconds_countserver
traces_span_metrics_calls_totalserver_address
traces_span_metrics_calls_totalnet_peer_name
traces_spanmetrics_calls_totalnet_peer_name