Send Kubernetes metrics, logs, and events using the OpenTelemetry Collector
If you currently have an OpenTelemetry Collector-based system in your Cluster, you can use these instructions.
Note
This method of configuration is not currently supported. It’s recommended that if you do not have an OpenTelemetry Collector-based system set up in your Cluster, you should use configuring with Grafana Kubernetes Monitoring Helm chart. This option offers more features and easier integration.
These instructions include:
- Using the OpenTelemetry Collector to send metrics to Grafana Cloud
- Enabling logs with the OpenTelemetry Logs Collector
- Enabling capturing Cluster events with the OpenTelemetry Collector
After connecting, you can view your resources, as well as their metrics and logs, in the Grafana Cloud Kubernetes integration.
Note
To gather metrics and logs, you perform two separate deployments of the OTel collector: 1) A Deployment or StatefulSet on a single Pod for metrics, and 2) A DaemonSet to put a collector on each Node to gather the Pod logs.
Before you begin
Before you begin the configuration steps, have the following available:
- A Kubernetes Cluster with role-based access control (RBAC) enabled
- A Grafana Cloud account. To create an account, navigate to Grafana Cloud, and click Create free account.
- The
kubectl
command-line tool installed on your local machine, configured to connect to your Cluster - The
helm
command-line tool installed on your local machine. If you already have working kube-state-metrics and node-exporter instances installed in your Cluster, skip this step. - A working OpenTelemetry Collector deployment. For more information, refer to the OpenTelemetry Collector documentation.
Configuration steps
Follow these steps to configure sending metrics and logs:
- Set up the metrics sources.
- Configure the OpenTelemetry Collector for metrics.
- Configure the OpenTelemetry Collector for logs.
- Configure the OpenTelemetry Collector for Cluster events.
- Set up the Kubernetes integration in Grafana Cloud.
Set up metrics sources
The Grafana Cloud Kubernetes integration requires metrics from specific sources. Some are embedded in the kubelet
, while others require deployment.
The following metric sources are embedded in the kubelet
:
- kubelet metrics for utilization and efficiency analysis
- cAdvisor for usage statistics on a container level
These metric sources require deployment:
- kube-state-metrics to represent resources within the Cluster
- node-exporter for Node-level metrics
Note
Due to differences in the metrics returned by the integrated Kubernetes Cluster Receiver and Kubelet Stats Receiver, Grafana Kubernetes Monitoring cannot use these.
If you already have kube-state-metrics
and node_exporter
instances deployed in your Cluster, skip the next two steps.
Set up kube-state-metrics
Run the following commands from your shell to install kube-state-metrics
into the default
namespace of your Kubernetes Cluster:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install ksm prometheus-community/kube-state-metrics -n "default"
To deploy kube-state-metrics
into a different namespace, replace default
in the preceding command with a different value.
Set up node_exporter
Run the following commands from your shell to install node_exporter
into the default
namespace of your Kubernetes Cluster:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install nodeexporter prometheus-community/prometheus-node-exporter -n "default"
This creates a DaemonSet to expose metrics on every Node in your Cluster.
To deploy the node_exporter
into a different namespace, replace default
in the previous command with a different value.
Configure the OpenTelemetry Metrics Collector
To configure the OpenTelemetry Collector:
- Add targeted endpoints for scraping.
- Include the remote write exporter to send metrics to Grafana Cloud.
- Link collected metrics to the remote write exporter.
Note
The OTel collector for metrics must be a Deployment and not DaemonSet. Otherwise, the metrics are duplicated.
Add scrape targets
Add the following to your OpenTelemetry Collector configuration by modifying the values used in the OpenTelemetry Collector Helm chart.
config:
receivers:
prometheus:
config:
scrape_configs:
- job_name: integrations/kubernetes/cadvisor
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
server_name: kubernetes
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
target_label: __metrics_path__
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|container_cpu_usage_seconds_total|container_fs_reads_bytes_total|container_fs_reads_total|container_fs_writes_bytes_total|container_fs_writes_total|container_memory_cache|container_memory_rss|container_memory_swap|container_memory_working_set_bytes|container_network_receive_bytes_total|container_network_receive_packets_dropped_total|container_network_receive_packets_total|container_network_transmit_bytes_total|container_network_transmit_packets_dropped_total|container_network_transmit_packets_total|machine_memory_bytes
- job_name: integrations/kubernetes/kubelet
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
server_name: kubernetes
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/$${1}/proxy/metrics
target_label: __metrics_path__
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: go_goroutines|kubelet_certificate_manager_client_expiration_renew_errors|kubelet_certificate_manager_client_ttl_seconds|kubelet_certificate_manager_server_ttl_seconds|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_cgroup_manager_duration_seconds_count|kubelet_node_config_error|kubelet_node_name|kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_start_duration_seconds_bucket|kubelet_pod_start_duration_seconds_count|kubelet_pod_worker_duration_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kubelet_running_container_count|kubelet_running_containers|kubelet_running_pod_count|kubelet_running_pods|kubelet_runtime_operations_errors_total|kubelet_runtime_operations_total|kubelet_server_expiration_renew_errors|kubelet_volume_stats_available_bytes|kubelet_volume_stats_capacity_bytes|kubelet_volume_stats_inodes|kubelet_volume_stats_inodes_free|kubelet_volume_stats_inodes_used|kubelet_volume_stats_used_bytes|kubernetes_build_info|namespace_workload_pod|process_cpu_seconds_total|process_resident_memory_bytes|rest_client_requests_total|storage_operation_duration_seconds_count|storage_operation_errors_total|volume_manager_total_volumes
- job_name: integrations/kubernetes/kube-state-metrics
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: kube-state-metrics
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: kube_configmap_info|kube_configmap_metadata_resource_version|kube_daemonset.*|kube_deployment_metadata_generation|kube_deployment_spec_replicas|kube_deployment_status_condition|kube_deployment_status_observed_generation|kube_deployment_status_replicas_available|kube_deployment_status_replicas_updated|kube_horizontalpodautoscaler_spec_max_replicas|kube_horizontalpodautoscaler_spec_min_replicas|kube_horizontalpodautoscaler_status_current_replicas|kube_horizontalpodautoscaler_status_desired_replicas|kube_job.*|kube_namespace_status_phase|kube_node.*|kube_persistentvolume_status_phase|kube_persistentvolumeclaim_access_mode|kube_persistentvolumeclaim_info|kube_persistentvolumeclaim_labels|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_persistentvolumeclaim_status_phase|kube_pod_container_info|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_last_terminated_reason|kube_pod_container_status_restarts_total|kube_pod_container_status_waiting_reason|kube_pod_info|kube_pod_owner|kube_pod_spec_volumes_persistentvolumeclaims_info|kube_pod_start_time|kube_pod_status_phase|kube_pod_status_reason|kube_replicaset.*|kube_resourcequota|kube_secret_metadata_resource_version|kube_statefulset.*
- job_name: integrations/node_exporter
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: prometheus-node-exporter.*
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: instance
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: node_cpu.*|node_exporter_build_info|node_filesystem.*|node_memory.*|node_network_receive_bytes_total|node_network_receive_drop_total|node_network_transmit_bytes_total|node_network_transmit_drop_total|process_cpu_seconds_total|process_resident_memory_bytes
This configuration adds four scrape targets with specific functions for discovery and scraping:
- All Nodes, scraping their cAdvisor endpoint (
integrations/kubernetes/cadvisor
) - All Nodes, scraping their Kubelet metrics endpoint (
integrations/kubernetes/kubelet
) - All Pods with the
app.kubernetes.io/name=kube-state-metrics
label, scraping their/metrics
endpoint (integrations/kubernetes/kube-state-metrics
) - All Pods with the
app.kubernetes.io/name=prometheus-node-exporter.*
label, scraping their/metrics
endpoint (integrations/node_exporter
)
Warning
For the Kubernetes integration to work correctly, you must set the
job
andinstance
labels exactly as prescribed in the preceding steps to be able to see your Cluster in Kubernetes Monitoring.
Each scrape target has a list of metrics to keep, which reduces the amount of unnecessary metrics sent to Grafana Cloud.
Set up RBAC for OpenTelemetry Metrics Collector
This configuration uses the built-in Kubernetes service discovery, so you must set up the service account running the OpenTelemetry Collector with advanced permissions (compared to the default set). The following ClusterRole
provides a good starting point:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-collector
rules:
- apiGroups:
- ''
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
- events
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
To bind this to a ServiceAccount
, use the following ClusterRoleBinding
:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otel-collector
subjects:
- kind: ServiceAccount
name: otel-collector # replace with your service account name
namespace: default # replace with your namespace
roleRef:
kind: ClusterRole
name: otel-collector
apiGroup: rbac.authorization.k8s.io
Configure the remote write exporter
To send the metrics to Grafana Cloud, add the following to your OpenTelemetry Collector configuration:
Note
You must add the
cluster
label on all metrics sent from this Kubernetes Cluster.
config:
exporters:
prometheusremotewrite/grafanaCloudMetrics:
external_labels:
cluster: 'your-cluster-name'
k8s.cluster.name: 'your-cluster-name'
endpoint: 'https://PROMETHEUS_USERNAME:ACCESS_POLICY_TOKEN@PROMETHEUS_URL/api/prom/push'
To retrieve your connection information:
- Go to your Grafana Cloud account.
- Select the correct organization in the dropdown menu.
- Select your desired stack in the main navigation on the left.
- Click the Send Metrics button on the Prometheus card to find your connection information on the page that displays.
For the token, it is recommended that you:
- Place it in a secure location.
- Inject it into the configuration using environment variables.
Link collected metrics
Link the collected metrics to the remote write exporter. As a good practice, add a batch
processor, which improves performance.
Add the following to the OpenTelemetry Collector configuration:
config:
processors:
batch: {}
service:
pipelines:
metrics/prod:
receivers: [prometheus]
processors: [batch]
exporters: [prometheusremotewrite/grafanaCloudMetrics]
After restarting your OpenTelemetry Collector, you should see the first metrics arriving in Grafana Cloud after a few minutes.
Configure the OpenTelemetry Logs Collector
Kubernetes writes logs to a specific file on the respective Node, so you must schedule a Pod on each Node to scrape these files. Do this with a separate DaemonSet
.
The following configuration file configures the OpenTelemetry Collector to use its logs collection preset, which gathers logs from the default logging location for Kubernetes. Make sure you use the same Cluster name as with your metrics, otherwise the correlation won’t work.
# This is a new configuration file - do not merge this with your metrics configuration!
presets:
logsCollection:
enabled: true
config:
processors:
resource/clusterName:
attributes:
- action: insert
key: k8s.cluster.name
value: 'your-cluster-name'
exporters:
otlphttp/grafanaCloudOTLPEndpoint:
endpoint: https://OTLP_USERNAME:ACCESS_POLICY_TOKEN@OTLP_ENDPOINT_URL/otlp
service:
pipelines:
logs:
processors:
- resource/clusterName
- memory_limiter
- batch
exporters:
- otlphttp/grafanaCloudOTLPEndpoint
Configure the OpenTelemetry Collector for Cluster events
Kubernetes controllers emit Events as they perform operations in your Cluster (like starting containers, scheduling Pods, etc.) and these can be a rich source of logging information to help you debug, monitor, and alert on your Kubernetes workloads.
Generally, these Events can be queried using kubectl get event
or kubectl describe
.
By enabling the OpenTelemetry Collector to capture these events and ship them to Grafana Cloud Loki, you can query these directly in Grafana Cloud.
To configure the OpenTelemetry Collector:
- Use the
kubernetesEvents
preset. - Include the exporter for it to send events as logs to the Grafana Cloud OTLP Endpoint.
- Link the collected events to the exporter.
Add the Kubernetes events integration
Add the following to your OpenTelemetry Collector configuration.
presets:
kubernetesEvents:
enabled: true
config:
processors:
resource/k8sEvents:
attributes:
- action: insert
key: k8s.cluster.name
value: 'your-cluster-name'
- action: upsert
key: service.name
value: 'integrations/kubernetes/eventhandler'
Set up RBAC for OpenTelemetry Collector
To allow the OpenTelemetry Collector the correct permissions to scrape Kubernetes Cluster events, you must modify the service account running the OpenTelemetry Collector with advanced permissions (compared to the default set). The following ClusterRole provides a good starting point:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otel-collector
rules:
- apiGroups:
- ''
resources:
- events
- namespaces
- namespaces/status
- nodes
- nodes/spec
- pods
- pods/status
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
Configure the exporter
To send the events to Grafana Cloud, add the following to your OpenTelemetry Collector configuration:
config:
exporters:
otlphttp/grafanaCloudOTLPEndpoint:
endpoint: https://OTLP_USERNAME:ACCESS_POLICY_TOKEN@OTLP_ENDPOINT_URL/otlp
To retrieve your connection information:
- Go to your Grafana Cloud account.
- Select the correct organization in the drop-down menu.
- Select your desired stack in the main navigation on the left.
- Click the Send Logs button on the Prometheus card. A page displays showing your connection information.
For the token, it is recommended that you:
- Place it in a secure location.
- Inject it into the configuration using environment variables.
Link collected events
Link the collected events to the exporter. It is also good practice to add a batch
processor to improve performance.
Add the following to the OpenTelemetry Collector configuration:
config:
service:
pipelines:
logs:
processors:
- resource/k8sEvents
- memory_limiter
- batch
exporters:
- debug
- otlphttp/grafanaCloudOTLPEndpoint
After restarting your OpenTelemetry Collector, you should see Kubernetes Cluster events arriving in Grafana Cloud after a few minutes.
Full example
You can perform the configuration of all the preceding steps with two deployments of the OpenTelemetry Collector Helm chart.
Deployment values
The following deploys an OpenTelemetry Collector as single instance Kubernetes Deployment that scrapes metrics and gathers Cluster events.
# Search for and replace the "REPLACE ME" fields
image:
repository: 'otel/opentelemetry-collector-contrib'
tag: latest
mode: deployment
presets:
kubernetesEvents:
enabled: true
config:
extensions:
basicauth/grafanaCloudMetrics:
client_auth:
username: # REPLACE ME
password: # REPLACE ME
basicauth/grafanaCloudOTLPEndpoint:
client_auth:
username: # REPLACE ME
password: # REPLACE ME
exporters:
prometheusremotewrite/grafanaCloudMetrics:
endpoint: # REPLACE ME
external_labels:
cluster: # REPLACE ME
k8s.cluster.name: # REPLACE ME
auth:
authenticator: basicauth/grafanaCloudMetrics
otlphttp/grafanaCloudOTLPEndpoint:
endpoint: # REPLACE ME
auth:
authenticator: basicauth/grafanaCloudOTLPEndpoint
processors:
resource/k8sEvents:
attributes:
- action: insert
key: k8s.cluster.name
value: # REPLACE ME
- action: upsert
key: service.name
value: 'integrations/kubernetes/eventhandler'
receivers:
prometheus:
config:
scrape_configs:
- job_name: integrations/kubernetes/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
server_name: kubernetes
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor
target_label: __metrics_path__
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: container_cpu_cfs_periods_total|container_cpu_cfs_throttled_periods_total|container_cpu_usage_seconds_total|container_fs_reads_bytes_total|container_fs_reads_total|container_fs_writes_bytes_total|container_fs_writes_total|container_memory_cache|container_memory_rss|container_memory_swap|container_memory_working_set_bytes|container_network_receive_bytes_total|container_network_receive_packets_dropped_total|container_network_receive_packets_total|container_network_transmit_bytes_total|container_network_transmit_packets_dropped_total|container_network_transmit_packets_total|machine_memory_bytes
- job_name: integrations/kubernetes/kubelet
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
server_name: kubernetes
kubernetes_sd_configs:
- role: node
relabel_configs:
- target_label: __address__
replacement: kubernetes.default.svc.cluster.local:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /api/v1/nodes/$${1}/proxy/metrics
target_label: __metrics_path__
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: go_goroutines|kubelet_certificate_manager_client_expiration_renew_errors|kubelet_certificate_manager_client_ttl_seconds|kubelet_certificate_manager_server_ttl_seconds|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_cgroup_manager_duration_seconds_count|kubelet_node_config_error|kubelet_node_name|kubelet_pleg_relist_duration_seconds_bucket|kubelet_pleg_relist_duration_seconds_count|kubelet_pleg_relist_interval_seconds_bucket|kubelet_pod_start_duration_seconds_bucket|kubelet_pod_start_duration_seconds_count|kubelet_pod_worker_duration_seconds_bucket|kubelet_pod_worker_duration_seconds_count|kubelet_running_container_count|kubelet_running_containers|kubelet_running_pod_count|kubelet_running_pods|kubelet_runtime_operations_errors_total|kubelet_runtime_operations_total|kubelet_server_expiration_renew_errors|kubelet_volume_stats_available_bytes|kubelet_volume_stats_capacity_bytes|kubelet_volume_stats_inodes|kubelet_volume_stats_inodes_free|kubelet_volume_stats_inodes_used|kubelet_volume_stats_used_bytes|kubernetes_build_info|namespace_workload_pod|process_cpu_seconds_total|process_resident_memory_bytes|rest_client_requests_total|storage_operation_duration_seconds_count|storage_operation_errors_total|volume_manager_total_volumes
- job_name: integrations/kubernetes/kube-state-metrics
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: kube-state-metrics
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: kube_configmap_info|kube_configmap_metadata_resource_version|kube_daemonset.*|kube_deployment_metadata_generation|kube_deployment_spec_replicas|kube_deployment_status_condition|kube_deployment_status_observed_generation|kube_deployment_status_replicas_available|kube_deployment_status_replicas_updated|kube_horizontalpodautoscaler_spec_max_replicas|kube_horizontalpodautoscaler_spec_min_replicas|kube_horizontalpodautoscaler_status_current_replicas|kube_horizontalpodautoscaler_status_desired_replicas|kube_job.*|kube_namespace_status_phase|kube_node.*|kube_persistentvolume_status_phase|kube_persistentvolumeclaim_access_mode|kube_persistentvolumeclaim_info|kube_persistentvolumeclaim_labels|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_persistentvolumeclaim_status_phase|kube_pod_container_info|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_last_terminated_reason|kube_pod_container_status_restarts_total|kube_pod_container_status_waiting_reason|kube_pod_info|kube_pod_owner|kube_pod_spec_volumes_persistentvolumeclaims_info|kube_pod_start_time|kube_pod_status_phase|kube_pod_status_reason|kube_replicaset.*|kube_resourcequota|kube_secret_metadata_resource_version|kube_statefulset.*
- job_name: integrations/node_exporter
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: prometheus-node-exporter.*
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: instance
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: node_cpu.*|node_exporter_build_info|node_filesystem.*|node_memory.*|node_network_receive_bytes_total|node_network_receive_drop_total|node_network_transmit_bytes_total|node_network_transmit_drop_total|process_cpu_seconds_total|process_resident_memory_bytes
- job_name: opentelemetry-collector
scrape_interval: 10s
static_configs:
- targets:
- ${env:MY_POD_IP}:8888
- job_name: integrations/opencost
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: opencost
metric_relabel_configs:
- source_labels: [__name__]
action: keep
regex: container_cpu_allocation|container_gpu_allocation|container_memory_allocation_bytes|deployment_match_labels|kubecost_cluster_info|kubecost_cluster_management_cost|kubecost_cluster_memory_working_set_bytes|kubecost_http_requests_total|kubecost_http_response_size_bytes|kubecost_http_response_time_seconds|kubecost_load_balancer_cost|kubecost_network_internet_egress_cost|kubecost_network_region_egress_cost|kubecost_network_zone_egress_cost|kubecost_node_is_spot|node_cpu_hourly_cost|node_gpu_count|node_gpu_hourly_cost|node_ram_hourly_cost|node_total_hourly_cost|opencost_build_info|pod_pvc_allocation|pv_hourly_cost|service_selector_labels|statefulSet_match_labels
service:
extensions:
- health_check
- basicauth/grafanaCloudMetrics
- basicauth/grafanaCloudOTLPEndpoint
pipelines:
metrics:
exporters:
- debug
- prometheusremotewrite/grafanaCloudMetrics
logs:
processors:
- resource/k8sEvents
- memory_limiter
- batch
exporters:
- debug
- otlphttp/grafanaCloudOTLPEndpoint
clusterRole:
create: true
rules:
- apiGroups:
- ''
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
- events
- namespaces
- namespaces/status
- pods/status
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
DaemonSet configuration
The following deploys an OpenTelemetry Collector as a Kubernetes DaemonSet that gathers Pod logs.
# Search for and replace the "REPLACE ME" fields
image:
repository: 'otel/opentelemetry-collector-contrib'
tag: latest
mode: daemonset
presets:
logsCollection:
enabled: true
config:
extensions:
basicauth/grafanaCloudOTLPEndpoint:
client_auth:
username: # REPLACE ME
password: # REPLACE ME
exporters:
otlphttp/grafanaCloudOTLPEndpoint:
endpoint: # REPLACE ME
auth:
authenticator: basicauth/grafanaCloudOTLPEndpoint
processors:
resource/clusterName:
attributes:
- action: insert
key: k8s.cluster.name
value: # REPLACE ME
service:
extensions:
- health_check
- basicauth/grafanaCloudOTLPEndpoint
pipelines:
logs:
processors:
- resource/clusterName
- memory_limiter
- batch
exporters:
- debug
- otlphttp/grafanaCloudOTLPEndpoint
Set up the Kubernetes integration in Grafana Cloud
The Kubernetes integration comes with a set of predefined recording and alerting rules. To install them, navigate to the Kubernetes integration configuration page located at Observability -> Kubernetes -> Configuration. To install the components, click the Install button.
After these steps, you can see your resources and metrics in the Kubernetes Integration.
Troubleshoot absence of resources
If the Kubernetes integration shows no resources, navigate to the Explore page in Grafana and enter the following query:
up{cluster="your-cluster-name"}
This query should return at least one series for each of the scrape targets defined previously. If you do not see any series or some of the series have a value of 0
, enable debug logging in the OpenTelemetry Collector with the following config snippet:
config:
service:
telemetry:
logs:
level: 'debug'
If you can see the collected metrics but the Kubernetes integration does not list your resources, make sure that each time series has a cluster
label set, and the job
label matches the names in the preceding configuration.