Collect logs in Kubernetes with the OpenTelemetry Collector
In Kubernetes, when applications log to stdout, the logs are collected by the Kubernetes API server and stored as files on the node running the application. It is recommended to run a log collector on each node to collect these logs and send them to a centralized log storage. In this guide we will show you how to use the Collector to collect the logs from the node.
Kubernetes Logging
In general, Kubernetes stores the logs in /var/log/pods/<namespace>_<pod_name>_<pod_id>/<container_name>/<run_id>.log
. We will use the filelog receiver in the Collector to collect the logs from this directory. The Collector needs to run as daemonset to be able to collect the logs from the nodes. The format in which the logs are stored is dependent on the container runtime being used. For example, if the runtime is containerd
, the format is ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
.
Configure the Collector
The different formats are confusing and not trivial to navigate. Hence we recommend using the following config for the filelog receiver. It will automatically detect the format and parse the logs accordingly.
receivers:
filelog:
include:
- /var/log/pods/*/*/*.log
include_file_name: false
include_file_path: true
operators:
- id: get-format
routes:
- expr: body matches "^\\{"
output: parser-docker
- expr: body matches "^[^ Z]+ "
output: parser-crio
- expr: body matches "^[^ Z]+Z"
output: parser-containerd
type: router
- id: parser-crio
output: extract_metadata_from_filepath
regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: 2006-01-02T15:04:05.999999999Z07:00
layout_type: gotime
parse_from: attributes.time
type: regex_parser
- id: parser-containerd
output: extract_metadata_from_filepath
regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: regex_parser
- id: parser-docker
output: extract_metadata_from_filepath
timestamp:
layout: '%Y-%m-%dT%H:%M:%S.%LZ'
parse_from: attributes.time
type: json_parser
- id: extract_metadata_from_filepath
parse_from: attributes["log.file.path"]
regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
type: regex_parser
- from: attributes.stream
to: attributes["log.iostream"]
type: move
- from: attributes.container_name
to: resource["k8s.container.name"]
type: move
- from: attributes.namespace
to: resource["k8s.namespace.name"]
type: move
- from: attributes.pod_name
to: resource["k8s.pod.name"]
type: move
- from: attributes.restart_count
to: resource["k8s.container.restart_count"]
type: move
- from: attributes.uid
to: resource["k8s.pod.uid"]
type: move
- from: attributes.log
to: body
type: move
start_at: beginning
Helm chart
Configuring this might seem scary but it has been made extremely easy if you’re using the helm-chart. You can find the helm chart here. To enable log collection you need to set the following values:
mode: daemonset
presets:
logsCollection:
enabled: true
includeCollectorLogs: true