Enabling logs with the OpenTelemetry Logs Collector
Enabling capturing Cluster events with the OpenTelemetry Collector
After connecting, you can view your resources, as well as their metrics and logs, in the Grafana Cloud Kubernetes integration.
Note
To gather metrics and logs, you perform two separate deployments of the OTel collector: 1) A Deployment or StatefulSet on a single Pod for metrics, and 2) A DaemonSet to put a collector on each Node to gather the Pod logs.
Before you begin
Before you begin the configuration steps, have the following available:
A Grafana Cloud account. To create an account, navigate to Grafana Cloud, and click Create free account.
The kubectl command-line tool installed on your local machine, configured to connect to your Cluster
The helm command-line tool installed on your local machine. If you already have working kube-state-metrics and node-exporter instances installed in your Cluster, skip this step.
This configuration adds four scrape targets with specific functions for discovery and scraping:
All Nodes, scraping their cAdvisor endpoint (integrations/kubernetes/cadvisor)
All Nodes, scraping their Kubelet metrics endpoint (integrations/kubernetes/kubelet)
All Pods with the app.kubernetes.io/name=kube-state-metrics label, scraping their /metrics endpoint (integrations/kubernetes/kube-state-metrics)
All Pods with the app.kubernetes.io/name=prometheus-node-exporter.* label, scraping their /metrics endpoint (integrations/node_exporter)
Warning
For the Kubernetes integration to work correctly, you must set the job and instance labels exactly as prescribed in the preceding steps to be able to see your Cluster in Kubernetes Monitoring.
Each scrape target has a list of metrics to keep, which reduces the amount of unnecessary metrics sent to Grafana Cloud.
Set up RBAC for OpenTelemetry Metrics Collector
This configuration uses the built-in Kubernetes service discovery, so you must set up the service account running the OpenTelemetry Collector with advanced permissions (compared to the default set). The following ClusterRole provides a good starting point:
yaml
---apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: otel-collector
rules:-apiGroups:-''resources:- nodes
- nodes/proxy
- services
- endpoints
- pods
- events
verbs:- get
- list
- watch
-nonResourceURLs:- /metrics
verbs:- get
To bind this to a ServiceAccount, use the following ClusterRoleBinding:
yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: otel-collector
subjects:-kind: ServiceAccount
name: otel-collector # replace with your service account namenamespace: default # replace with your namespaceroleRef:kind: ClusterRole
name: otel-collector
apiGroup: rbac.authorization.k8s.io
Configure the remote write exporter
To send the metrics to Grafana Cloud, add the following to your OpenTelemetry Collector configuration:
After restarting your OpenTelemetry Collector, you should see the first metrics arriving in Grafana Cloud after a few minutes.
Configure the OpenTelemetry Logs Collector
Kubernetes writes logs to a specific file on the respective Node, so you must schedule a Pod on each Node to scrape these files. Do this with a separate DaemonSet.
The following configuration file configures the OpenTelemetry Collector to scrape logs from the default logging location for Kubernetes. Make sure you use the same Cluster name as with your metrics, otherwise the correlation won’t work.
yaml
# This is a new configuration file - do not merge this with your metrics configuration!receivers:filelog:include:- /var/log/pods/*/*/*.logstart_at: beginning
include_file_path:trueinclude_file_name:falseoperators:# Find out which format is used by kubernetes-type: router
id: get-format
routes:-output: parser-docker
expr:'body matches "^\\{"'-output: parser-crio
expr:'body matches "^[^ Z]+ "'-output: parser-containerd
expr:'body matches "^[^ Z]+Z"'# Parse CRI-O format-type: regex_parser
id: parser-crio
regex:'^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'output: extract_metadata_from_filepath
timestamp:parse_from: attributes.time
layout_type: gotime
layout:'2006-01-02T15:04:05.999999999Z07:00'# Parse CRI-Containerd format-type: regex_parser
id: parser-containerd
regex:'^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'output: extract_metadata_from_filepath
timestamp:parse_from: attributes.time
layout:'%Y-%m-%dT%H:%M:%S.%LZ'# Parse Docker format-type: json_parser
id: parser-docker
output: extract_metadata_from_filepath
timestamp:parse_from: attributes.time
layout:'%Y-%m-%dT%H:%M:%S.%LZ'# Extract metadata from file path-type: regex_parser
id: extract_metadata_from_filepath
# Pod UID is not always 36 characters longregex:'^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{16,36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'parse_from: attributes["log.file.path"]cache:size:128# default maximum amount of Pods per Node is 110# Rename attributes-type: move
from: attributes["log.file.path"]to: resource["filename"]-type: move
from: attributes.container_name
to: resource["container"]-type: move
from: attributes.namespace
to: resource["namespace"]-type: move
from: attributes.pod_name
to: resource["pod"]-type: add
field: resource["cluster"]value:'your-cluster-name'# Set your cluster name here-type: move
from: attributes.log
to: body
processors:resource:attributes:-action: insert
key: loki.format
value: raw
-action: insert
key: loki.resource.labels
value: pod, namespace, container, cluster, filename
exporters:loki:endpoint: https://LOKI_USERNAME:ACCESS_POLICY_TOKEN@LOKI_URL/loki/api/v1/push
service:pipelines:logs:receivers:[filelog]processors:[resource]exporters:[loki]
Configure the OpenTelemetry Collector for Cluster events
Kubernetes controllers emit Events as they perform operations in your Cluster (like starting containers, scheduling Pods, etc.) and these can be a rich source of logging information to help you debug, monitor, and alert on your Kubernetes workloads.
Generally, these Events can be queried using kubectl get event or kubectl describe.
By enabling the OpenTelemetry Collector to capture these events and ship them to Grafana Cloud Loki, you can query these directly in Grafana Cloud.
To configure the OpenTelemetry Collector:
Add the k8s_events integration.
Include the exporter for it to send events as logs to Grafana Cloud Loki.
Link the collected events to the exporter.
Add the Kubernetes events integration
Add the following to your OpenTelemetry Collector configuration. You can usually find the configuration in a ConfigMap.
To allow the OpenTelemetry Collector the correct permissions to scrape Kubernetes Cluster events, you must modify the service account running the OpenTelemetry Collector with advanced permissions (compared to the default set).
The following ClusterRole provides a good starting point:
yaml
---apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: otel-collector
rules:-apiGroups:-''resources:- events
- namespaces
- namespaces/status
- nodes
- nodes/spec
- pods
- pods/status
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
- services
verbs:- get
- list
- watch
-apiGroups:- apps
resources:- daemonsets
- deployments
- replicasets
- statefulsets
verbs:- get
- list
- watch
-apiGroups:- extensions
resources:- daemonsets
- deployments
- replicasets
verbs:- get
- list
- watch
-apiGroups:- batch
resources:- jobs
- cronjobs
verbs:- get
- list
- watch
-apiGroups:- autoscaling
resources:- horizontalpodautoscalers
verbs:- get
- list
- watch
Configure the exporter
To send the events to Grafana Cloud, add the following to your OpenTelemetry Collector configuration:
The following deploys an OpenTelemetry Collector as a Kubernetes DaemonSet that gathers Pod logs.
yaml
# Search for and replace the "REPLACE ME" fieldsmode: daemonset
extraVolumes:-name: varlog
hostPath:path: /var/log
extraVolumeMounts:-name: varlog
mountPath: /var/log
readOnly:trueconfig:extensions:basicauth/logsService:client_auth:username:''# REPLACE MEpassword:''# REPLACE MEreceivers:filelog:include:- /var/log/pods/*/*/*.logstart_at: beginning
include_file_path:trueinclude_file_name:falseoperators:# Find out which format is used by kubernetes-type: router
id: get-format
routes:-output: parser-docker
expr:'body matches "^\\{"'-output: parser-crio
expr:'body matches "^[^ Z]+ "'-output: parser-containerd
expr:'body matches "^[^ Z]+Z"'# Parse CRI-O format-type: regex_parser
id: parser-crio
regex:'^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'output: extract_metadata_from_filepath
timestamp:parse_from: attributes.time
layout_type: gotime
layout:'2006-01-02T15:04:05.999999999Z07:00'# Parse CRI-Containerd format-type: regex_parser
id: parser-containerd
regex:'^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$'output: extract_metadata_from_filepath
timestamp:parse_from: attributes.time
layout:'%Y-%m-%dT%H:%M:%S.%LZ'# Parse Docker format-type: json_parser
id: parser-docker
output: extract_metadata_from_filepath
timestamp:parse_from: attributes.time
layout:'%Y-%m-%dT%H:%M:%S.%LZ'-type: move
from: attributes.log
to: body
Extract metadata from file path
-type: regex_parser
id: extract_metadata_from_filepath
# Pod UID is not always 36 characters longregex:'^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{16,36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$'parse_from: attributes["log.file.path"]cache:size:128# default maximum amount of Pods per Node is 110# Rename attributes-type: move
from: attributes["log.file.path"]to: resource["filename"]-type: move
from: attributes.container_name
to: resource["container"]-type: move
from: attributes.namespace
to: resource["namespace"]-type: move
from: attributes.pod_name
to: resource["pod"]-type: add
field: resource["cluster"]value:'cluster-name'# REPLACE MEprocessors:resource:attributes:-action: insert
key: loki.format
value: raw
-action: insert
key: loki.resource.labels
value: pod, namespace, container, cluster, filename
exporters:loki:endpoint: https://loki.example.com/loki/api/v1/push # REPLACE MEauth:authenticator: basicauth/logsService
service:extensions:- health_check
- memory_ballast
- basicauth/logsService
pipelines:logs:receivers:[filelog]processors:[resource]exporters:[loki]
Set up the Kubernetes integration in Grafana Cloud
The Kubernetes integration comes with a set of predefined recording and alerting rules. To install them, navigate to the Kubernetes integration configuration page located at Observability -> Kubernetes -> Configuration. To install the components, click the Install button.
After these steps, you can see your resources and metrics in the Kubernetes Integration.
Troubleshoot absence of resources
If the Kubernetes integration shows no resources, navigate to the Explore page in Grafana and enter the following query:
promql
up{cluster="your-cluster-name"}
This query should return at least one series for each of the scrape targets defined previously. If you do not see any series or some of the series have a value of 0, enable debug logging in the OpenTelemetry Collector with the following config snippet:
yaml
service:telemetry:logs:level:'debug'
If you can see the collected metrics but the Kubernetes integration does not list your resources, make sure that each time series has a cluster label set, and the job label matches the names in the preceding configuration.