Troubleshoot Kubernetes Monitoring
This guide will help you troubleshoot common errors encountered while installing and configuring Kubernetes Monitoring components.
Topic | Description |
---|---|
Resolve issue of missing Kubernetes efficiency data | If you are missing resource efficiency data, you can follow instructions to add Node Exporter metrics. |
Issues with OpenShift | Get support for handling errors with your OpenShift configuration. |
Resolve issue of missing Kubernetes Efficiency data
If your Efficiency view in Kubernetes Monitoring shows no data, it might be due to missing Node Exporter metrics. The steps you take to resolve the issue depend on how you configured Kubernetes Monitoring. Select one of the methods:
- Deploy Node Exporter metrics if you configured using Grafana Agent
- Deploy Node Exporter metrics if you configured using Grafana Agent Operator
Deploy Node Exporter metrics if you configured using Grafana Agent
If you have a Grafana Agent deployment, follow these steps to deploy Node Exporter metrics:
Create a Node Exporter deployment using Helm by copying the following into a yaml file and passing it to
helm install
:helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm install nodeexporter prometheus-community/prometheus-node-exporter -n ${NAMESPACE}
Obtain the current ConfigMap by running the following command in your terminal:
NAMESPACE=default kubectl get configmap grafana-agent -o jsonpath='{.data.agent\.yaml}' -n "${NAMESPACE}" > agent.yaml
Edit the resulting
agent.yaml
file and add the following configuration aftermetrics.configs[0].scrape_configs
in the yaml file:- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: integrations/node_exporter kubernetes_sd_configs: - namespaces: names: - ${NAMESPACE} role: pod relabel_configs: - action: keep regex: prometheus-node-exporter.* source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: instance - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: false
Update the ConfigMap by running this command in your terminal:
NAMESPACE=default kubectl create configmap grafana-agent --from-literal=agent.yaml="$(envsubst < agent.yaml)" -n "${NAMESPACE}" --dry-run=client -o yaml | kubectl apply -n "${NAMESPACE}" -f -
In your Grafana Cloud stack, click Kubernetes Monitoring, then select the Efficiency tab.
Your efficiency data should now appear in the view.
Deploy Node Exporter metrics if you configured using Grafana Agent Operator
If you have a Grafana Agent Operator deployment, follow these steps to deploy Node Exporter metrics:
Copy the following deployment schema to a file.
apiVersion: monitoring.grafana.com/v1alpha1 kind: Integration metadata: labels: agent: grafana-agent name: node-exporter namespace: ${NAMESPACE} spec: config: autoscrape: enable: true metrics_instance: ${NAMESPACE}/grafana-agent-metrics procfs_path: host/proc rootfs_path: /host/root sysfs_path: /host/sys name: node_exporter type: allNodes: true unique: true volumeMounts: - mountPath: /host/root name: rootfs - mountPath: /host/sys name: sysfs - mountPath: /host/proc name: procfs volumes: - hostPath: path: / name: rootfs - hostPath: path: /sys name: sysfs - hostPath: path: /proc name: procfs
Change
$NAMESPACE
to the namespace you specified when you installed Kubernetes Monitoring using Agent Operator.Use
kubectl apply -f
followed by your filename to roll this out to your cluster.In your Grafana Cloud stack, click Kubernetes Monitoring, then select the Efficiency tab.
Your efficiency data should now appear in the view.
OpenShift Support
With OpenShift’s default SecurityContextConstraints
(scc
) of restricted
(see the scc
documentation for more info), you may run into the following errors while deploying Grafana Agent using the default generated manifests:
msg="error creating the agent server entrypoint" err="creating HTTP listener: listen tcp 0.0.0.0:80: bind: permission denied"
By default, the Agent StatefulSet container attempts to bind to port 80
, which is only allowed by the root user (0
) and other privileged users. With the default restricted
SCC on OpenShift, this will result in the above error.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 3m55s (x19 over 15m) daemonset-controller Error creating: pods "grafana-agent-logs-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.containers[0].securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000650000, 1000659999], spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
By default, the Agent DaemonSet attempts to run as root user and also attempts to access directories on the host (to tail logs). With the default restricted
SCC on OpenShift, this will result in the above error.
To solve these errors, use the hostmount-anyuid
SCC provided by OpenShift, which allows containers to run as root and mount directories on the host.
If this does not meet your security needs, you should create a new SCC with the required tailored permissions, or investigate running Agent as a non-root container, which goes beyond the scope of this troubleshooting guide.
To use the hostmount-anyuid
SCC, add the following stanza to the grafana-agent
and grafana-agent-logs
ClusterRoles:
. . .
- apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
verbs:
- use
resourceNames:
- hostmount-anyuid
. . .