Menu
Grafana Cloud

Troubleshoot Kubernetes Monitoring

This guide will help you troubleshoot common errors encountered while installing and configuring Kubernetes Monitoring components.

TopicDescription
Resolve issue of missing Kubernetes efficiency dataIf you are missing resource efficiency data, you can follow instructions to add Node Exporter metrics.
Issues with OpenShiftGet support for handling errors with your OpenShift configuration.

Resolve issue of missing Kubernetes Efficiency data

If your Efficiency view in Kubernetes Monitoring shows no data, it might be due to missing Node Exporter metrics. The steps you take to resolve the issue depend on how you configured Kubernetes Monitoring. Select one of the methods:

Deploy Node Exporter metrics if you configured using Grafana Agent

If you have a Grafana Agent deployment, follow these steps to deploy Node Exporter metrics:

  1. Create a Node Exporter deployment using Helm by copying the following into a yaml file and passing it to helm install:

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    helm install nodeexporter prometheus-community/prometheus-node-exporter -n ${NAMESPACE}
    
  2. Obtain the current ConfigMap by running the following command in your terminal:

    NAMESPACE=default kubectl get configmap grafana-agent -o jsonpath='{.data.agent\.yaml}' -n "${NAMESPACE}" > agent.yaml

  3. Edit the resulting agent.yaml file and add the following configuration after metrics.configs[0].scrape_configs in the yaml file:

    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
     job_name: integrations/node_exporter
     kubernetes_sd_configs:
       - namespaces:
           names:
             - ${NAMESPACE}
         role: pod
     relabel_configs:
       - action: keep
         regex: prometheus-node-exporter.*
         source_labels:
           - __meta_kubernetes_pod_label_app_kubernetes_io_name
       - action: replace
         source_labels:
           - __meta_kubernetes_pod_node_name
         target_label: instance
       - action: replace
         source_labels:
           - __meta_kubernetes_namespace
         target_label: namespace
     tls_config:
       ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
       insecure_skip_verify: false
    
  4. Update the ConfigMap by running this command in your terminal:

    NAMESPACE=default kubectl create configmap grafana-agent --from-literal=agent.yaml="$(envsubst < agent.yaml)" -n "${NAMESPACE}" --dry-run=client -o yaml | kubectl apply -n "${NAMESPACE}" -f -

  5. In your Grafana Cloud stack, click Kubernetes Monitoring, then select the Efficiency tab.

    Your efficiency data should now appear in the view.

Deploy Node Exporter metrics if you configured using Grafana Agent Operator

If you have a Grafana Agent Operator deployment, follow these steps to deploy Node Exporter metrics:

  1. Copy the following deployment schema to a file.

    apiVersion: monitoring.grafana.com/v1alpha1
    kind: Integration
    metadata:
      labels:
        agent: grafana-agent
      name: node-exporter
      namespace: ${NAMESPACE}
    spec:
      config:
        autoscrape:
          enable: true
          metrics_instance: ${NAMESPACE}/grafana-agent-metrics
        procfs_path: host/proc
        rootfs_path: /host/root
        sysfs_path: /host/sys
      name: node_exporter
      type:
        allNodes: true
        unique: true
      volumeMounts:
        - mountPath: /host/root
          name: rootfs
        - mountPath: /host/sys
          name: sysfs
        - mountPath: /host/proc
          name: procfs
      volumes:
        - hostPath:
            path: /
          name: rootfs
        - hostPath:
            path: /sys
          name: sysfs
        - hostPath:
            path: /proc
          name: procfs
    
  2. Change $NAMESPACE to the namespace you specified when you installed Kubernetes Monitoring using Agent Operator.

  3. Use kubectl apply -f followed by your filename to roll this out to your cluster.

  4. In your Grafana Cloud stack, click Kubernetes Monitoring, then select the Efficiency tab.

    Your efficiency data should now appear in the view.

OpenShift Support

With OpenShift’s default SecurityContextConstraints (scc) of restricted (see the scc documentation for more info), you may run into the following errors while deploying Grafana Agent using the default generated manifests:

msg="error creating the agent server entrypoint" err="creating HTTP listener: listen tcp 0.0.0.0:80: bind: permission denied"

By default, the Agent StatefulSet container attempts to bind to port 80, which is only allowed by the root user (0) and other privileged users. With the default restricted SCC on OpenShift, this will result in the above error.

Events:
  Type     Reason        Age                   From                  Message
  ----     ------        ----                  ----                  -------
  Warning  FailedCreate  3m55s (x19 over 15m)  daemonset-controller  Error creating: pods "grafana-agent-logs-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.containers[0].securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000650000, 1000659999], spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

By default, the Agent DaemonSet attempts to run as root user and also attempts to access directories on the host (to tail logs). With the default restricted SCC on OpenShift, this will result in the above error.

To solve these errors, use the hostmount-anyuid SCC provided by OpenShift, which allows containers to run as root and mount directories on the host.

If this does not meet your security needs, you should create a new SCC with the required tailored permissions, or investigate running Agent as a non-root container, which goes beyond the scope of this troubleshooting guide.

To use the hostmount-anyuid SCC, add the following stanza to the grafana-agent and grafana-agent-logs ClusterRoles:

. . .
- apiGroups:
  - security.openshift.io
  resources:
  - securitycontextconstraints
  verbs:
  - use
  resourceNames:
  - hostmount-anyuid
. . .