Send metrics, logs, and events with Grafana Agent
When you configure Kubernetes Monitoring using Grafana Agent, Agent scrapes the following targets by default:
- cAdvisor (one per node): cAdvisor is present on each node in your cluster and emits container resource usage metrics like CPU usage, memory usage, and disk usage.
- kubelet (one per node): kubelet is present on each node and emits metrics specific to the kubelet process like
kubelet_running_pods
andkubelet_running_container_count
. - kube-state-metrics (one replica, by default): kube-state-metrics runs as a Deployment and Service in your cluster and emits Prometheus metrics that track the state of objects in your cluster, like pods, deployments, and daemonSets.
The default ConfigMap that results from the configuration process creates an allowlist. This allowlist is configured to drop all metrics not referenced in the Kubernetes Monitoring dashboards, alerts, and recording rules. You can optionally do any of these with the allowlist:
- Modify it.
- Replace it with a denylist (by using the
drop
directive). - Omit it entirely.
- Move it to the
remote_write
level so that it applies globally to all configured scrape jobs.
To learn more, see:
- Reducing Prometheus metrics usage with relabeling
- Reducing Kubernetes metrics usage for other methods of controlling usage
Before you start
To deploy Kubernetes Monitoring, you need:
- A Kubernetes cluster, environment, or fleet you want to monitor
- The kubectl, curl, envsubst, and Helm command-line tools
Note: Make sure you deploy the required resources in the same namespace to avoid any missing data. For example, if Grafana Agent is deployed in one namespace, and ’node_exporter` is deployed in another, you will have missing resource efficiency data.
1. Use a Grafana Cloud Access Policy Token
You can create a new access policy token or use an existing token. See Grafana Cloud Access Policies for more information.
You’ll use this token in future steps.
2. Deploy Agent ConfigMap & StatefulSet for metrics and events
The ConfigMap configures:
- The Grafana Agent StatefulSet to scrape the
cadvisor
,kubelet
, andkube-state-metrics
endpoints in your cluster - The Agent to collect Kubernetes events from your cluster’s control plane and send them to your Cloud Loki instance.
Save the following ConfigMap to a file, and replace within it the following:
NAMESPACE
with the namespace for the Grafana Agent for metricsCLUSTER_NAME
with the name of your clusterMETRICS_HOST
with the hostname for your Prometheus instanceMETRICS_USERNAME
with the username for your Prometheus instanceMETRICS_PASSWORD
with your Access Policy Token from earlierLOGS_HOST
with the hostname for your Loki instanceLOGS_USERNAME
with the username for your Loki instanceLOGS_PASSWORD
with your Access Policy Token from earlier
kind: ConfigMap metadata: name: grafana-agent namespace: NAMESPACE apiVersion: v1 data: agent.yaml: | metrics: wal_directory: /var/lib/agent/wal global: scrape_interval: 60s external_labels: cluster: CLUSTER_NAME configs: - name: integrations remote_write: - url: METRICS_HOST/api/prom/push basic_auth: username: METRICS_USERNAME password: METRICS_PASSWORD scrape_configs: - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: integrations/kubernetes/cadvisor kubernetes_sd_configs: - role: node metric_relabel_configs: - source_labels: [__name__] regex: kubelet_running_containers|go_goroutines|kubelet_runtime_operations_errors_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|namespace_memory:kube_pod_container_resource_limits:sum|kubelet_volume_stats_inodes_used|kubelet_certificate_manager_server_ttl_seconds|namespace_workload_pod:kube_pod_owner:relabel|kubelet_node_config_error|kube_daemonset_status_number_misscheduled|kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_working_set_bytes|container_fs_reads_bytes_total|kube_node_status_condition|namespace_cpu:kube_pod_container_resource_requests:sum|kubelet_server_expiration_renew_errors|container_fs_writes_total|kube_horizontalpodautoscaler_status_desired_replicas|node_filesystem_avail_bytes|kube_pod_status_reason|node_filesystem_size_bytes|kube_deployment_spec_replicas|kube_statefulset_metadata_generation|namespace_workload_pod|storage_operation_duration_seconds_count|kubelet_certificate_manager_client_expiration_renew_errors|kube_pod_container_resource_limits|kube_statefulset_status_replicas_updated|node_namespace_pod_container:container_memory_rss|kube_statefulset_status_observed_generation|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|kubelet_pleg_relist_interval_seconds_bucket|kube_job_status_start_time|kube_deployment_status_observed_generation|kubelet_pod_worker_duration_seconds_bucket|container_memory_cache|kube_resourcequota|kube_horizontalpodautoscaler_spec_min_replicas|namespace_memory:kube_pod_container_resource_requests:sum|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_daemonset_status_number_available|kube_job_failed|storage_operation_errors_total|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_fs_writes_bytes_total|kube_statefulset_replicas|kube_replicaset_owner|container_network_receive_bytes_total|volume_manager_total_volumes|kube_horizontalpodautoscaler_spec_max_replicas|kube_daemonset_status_desired_number_scheduled|kube_pod_container_status_waiting_reason|process_cpu_seconds_total|kube_node_status_allocatable|kube_deployment_status_replicas_available|kube_daemonset_status_updated_number_scheduled|container_network_receive_packets_total|container_memory_rss|container_cpu_usage_seconds_total|kube_namespace_status_phase|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_volume_stats_available_bytes|kube_deployment_status_replicas_updated|kubelet_running_container_count|kube_node_info|container_network_transmit_packets_dropped_total|kubelet_certificate_manager_client_ttl_seconds|kube_pod_owner|kubelet_volume_stats_inodes|kubelet_runtime_operations_total|container_cpu_cfs_throttled_periods_total|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_pod_count|container_network_transmit_packets_total|kubelet_node_name|kube_daemonset_status_current_number_scheduled|kube_statefulset_status_replicas_ready|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kubelet_volume_stats_capacity_bytes|kube_horizontalpodautoscaler_status_current_replicas|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|kube_node_spec_taint|kubelet_pleg_relist_duration_seconds_bucket|kube_pod_status_phase|container_cpu_cfs_periods_total|kube_deployment_metadata_generation|node_namespace_pod_container:container_memory_cache|kube_statefulset_status_current_revision|kubelet_pleg_relist_duration_seconds_count|container_fs_reads_total|kube_statefulset_status_update_revision|container_network_receive_packets_dropped_total|kube_pod_info|kubelet_running_pods|process_resident_memory_bytes|kubelet_pod_worker_duration_seconds_count|kubelet_pod_start_duration_seconds_count|kubelet_cgroup_manager_duration_seconds_count|kube_node_status_capacity|container_network_transmit_bytes_total|rest_client_requests_total|kubernetes_build_info|machine_memory_bytes|kube_statefulset_status_replicas|container_memory_swap|kube_job_status_active|kubelet_pod_start_duration_seconds_bucket|node_namespace_pod_container:container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*|node_network_transmit_bytes_total action: keep relabel_configs: - replacement: kubernetes.default.svc.cluster.local:443 target_label: __address__ - regex: (.+) replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: false server_name: kubernetes - job_name: integrations/kubernetes/kubelet bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node metric_relabel_configs: - source_labels: [__name__] regex: kubelet_running_containers|go_goroutines|kubelet_runtime_operations_errors_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|namespace_memory:kube_pod_container_resource_limits:sum|kubelet_volume_stats_inodes_used|kubelet_certificate_manager_server_ttl_seconds|namespace_workload_pod:kube_pod_owner:relabel|kubelet_node_config_error|kube_daemonset_status_number_misscheduled|kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_working_set_bytes|container_fs_reads_bytes_total|kube_node_status_condition|namespace_cpu:kube_pod_container_resource_requests:sum|kubelet_server_expiration_renew_errors|container_fs_writes_total|kube_horizontalpodautoscaler_status_desired_replicas|node_filesystem_avail_bytes|kube_pod_status_reason|node_filesystem_size_bytes|kube_deployment_spec_replicas|kube_statefulset_metadata_generation|namespace_workload_pod|storage_operation_duration_seconds_count|kubelet_certificate_manager_client_expiration_renew_errors|kube_pod_container_resource_limits|kube_statefulset_status_replicas_updated|node_namespace_pod_container:container_memory_rss|kube_statefulset_status_observed_generation|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|kubelet_pleg_relist_interval_seconds_bucket|kube_job_status_start_time|kube_deployment_status_observed_generation|kubelet_pod_worker_duration_seconds_bucket|container_memory_cache|kube_resourcequota|kube_horizontalpodautoscaler_spec_min_replicas|namespace_memory:kube_pod_container_resource_requests:sum|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_daemonset_status_number_available|kube_job_failed|storage_operation_errors_total|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_fs_writes_bytes_total|kube_statefulset_replicas|kube_replicaset_owner|container_network_receive_bytes_total|volume_manager_total_volumes|kube_horizontalpodautoscaler_spec_max_replicas|kube_daemonset_status_desired_number_scheduled|kube_pod_container_status_waiting_reason|process_cpu_seconds_total|kube_node_status_allocatable|kube_deployment_status_replicas_available|kube_daemonset_status_updated_number_scheduled|container_network_receive_packets_total|container_memory_rss|container_cpu_usage_seconds_total|kube_namespace_status_phase|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_volume_stats_available_bytes|kube_deployment_status_replicas_updated|kubelet_running_container_count|kube_node_info|container_network_transmit_packets_dropped_total|kubelet_certificate_manager_client_ttl_seconds|kube_pod_owner|kubelet_volume_stats_inodes|kubelet_runtime_operations_total|container_cpu_cfs_throttled_periods_total|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_pod_count|container_network_transmit_packets_total|kubelet_node_name|kube_daemonset_status_current_number_scheduled|kube_statefulset_status_replicas_ready|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kubelet_volume_stats_capacity_bytes|kube_horizontalpodautoscaler_status_current_replicas|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|kube_node_spec_taint|kubelet_pleg_relist_duration_seconds_bucket|kube_pod_status_phase|container_cpu_cfs_periods_total|kube_deployment_metadata_generation|node_namespace_pod_container:container_memory_cache|kube_statefulset_status_current_revision|kubelet_pleg_relist_duration_seconds_count|container_fs_reads_total|kube_statefulset_status_update_revision|container_network_receive_packets_dropped_total|kube_pod_info|kubelet_running_pods|process_resident_memory_bytes|kubelet_pod_worker_duration_seconds_count|kubelet_pod_start_duration_seconds_count|kubelet_cgroup_manager_duration_seconds_count|kube_node_status_capacity|container_network_transmit_bytes_total|rest_client_requests_total|kubernetes_build_info|machine_memory_bytes|kube_statefulset_status_replicas|container_memory_swap|kube_job_status_active|kubelet_pod_start_duration_seconds_bucket|node_namespace_pod_container:container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*|node_network_transmit_bytes_total action: keep relabel_configs: - replacement: kubernetes.default.svc.cluster.local:443 target_label: __address__ - regex: (.+) replacement: /api/v1/nodes/${1}/proxy/metrics source_labels: - __meta_kubernetes_node_name target_label: __metrics_path__ scheme: https tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: false server_name: kubernetes - job_name: integrations/kubernetes/kube-state-metrics kubernetes_sd_configs: - role: pod metric_relabel_configs: - source_labels: [__name__] regex: kubelet_running_containers|go_goroutines|kubelet_runtime_operations_errors_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|namespace_memory:kube_pod_container_resource_limits:sum|kubelet_volume_stats_inodes_used|kubelet_certificate_manager_server_ttl_seconds|namespace_workload_pod:kube_pod_owner:relabel|kubelet_node_config_error|kube_daemonset_status_number_misscheduled|kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_working_set_bytes|container_fs_reads_bytes_total|kube_node_status_condition|namespace_cpu:kube_pod_container_resource_requests:sum|kubelet_server_expiration_renew_errors|container_fs_writes_total|kube_horizontalpodautoscaler_status_desired_replicas|node_filesystem_avail_bytes|kube_pod_status_reason|node_filesystem_size_bytes|kube_deployment_spec_replicas|kube_statefulset_metadata_generation|namespace_workload_pod|storage_operation_duration_seconds_count|kubelet_certificate_manager_client_expiration_renew_errors|kube_pod_container_resource_limits|kube_statefulset_status_replicas_updated|node_namespace_pod_container:container_memory_rss|kube_statefulset_status_observed_generation|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|kubelet_pleg_relist_interval_seconds_bucket|kube_job_status_start_time|kube_deployment_status_observed_generation|kubelet_pod_worker_duration_seconds_bucket|container_memory_cache|kube_resourcequota|kube_horizontalpodautoscaler_spec_min_replicas|namespace_memory:kube_pod_container_resource_requests:sum|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_daemonset_status_number_available|kube_job_failed|storage_operation_errors_total|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_fs_writes_bytes_total|kube_statefulset_replicas|kube_replicaset_owner|container_network_receive_bytes_total|volume_manager_total_volumes|kube_horizontalpodautoscaler_spec_max_replicas|kube_daemonset_status_desired_number_scheduled|kube_pod_container_status_waiting_reason|process_cpu_seconds_total|kube_node_status_allocatable|kube_deployment_status_replicas_available|kube_daemonset_status_updated_number_scheduled|container_network_receive_packets_total|container_memory_rss|container_cpu_usage_seconds_total|kube_namespace_status_phase|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_volume_stats_available_bytes|kube_deployment_status_replicas_updated|kubelet_running_container_count|kube_node_info|container_network_transmit_packets_dropped_total|kubelet_certificate_manager_client_ttl_seconds|kube_pod_owner|kubelet_volume_stats_inodes|kubelet_runtime_operations_total|container_cpu_cfs_throttled_periods_total|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_pod_count|container_network_transmit_packets_total|kubelet_node_name|kube_daemonset_status_current_number_scheduled|kube_statefulset_status_replicas_ready|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kubelet_volume_stats_capacity_bytes|kube_horizontalpodautoscaler_status_current_replicas|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|kube_node_spec_taint|kubelet_pleg_relist_duration_seconds_bucket|kube_pod_status_phase|container_cpu_cfs_periods_total|kube_deployment_metadata_generation|node_namespace_pod_container:container_memory_cache|kube_statefulset_status_current_revision|kubelet_pleg_relist_duration_seconds_count|container_fs_reads_total|kube_statefulset_status_update_revision|container_network_receive_packets_dropped_total|kube_pod_info|kubelet_running_pods|process_resident_memory_bytes|kubelet_pod_worker_duration_seconds_count|kubelet_pod_start_duration_seconds_count|kubelet_cgroup_manager_duration_seconds_count|kube_node_status_capacity|container_network_transmit_bytes_total|rest_client_requests_total|kubernetes_build_info|machine_memory_bytes|kube_statefulset_status_replicas|container_memory_swap|kube_job_status_active|kubelet_pod_start_duration_seconds_bucket|node_namespace_pod_container:container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*|node_network_transmit_bytes_total action: keep relabel_configs: - action: keep regex: kube-state-metrics source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name - job_name: integrations/node_exporter bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - namespaces: names: - NAMESPACE role: pod metric_relabel_configs: - source_labels: [__name__] regex: kubelet_running_containers|go_goroutines|kubelet_runtime_operations_errors_total|cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits|namespace_memory:kube_pod_container_resource_limits:sum|kubelet_volume_stats_inodes_used|kubelet_certificate_manager_server_ttl_seconds|namespace_workload_pod:kube_pod_owner:relabel|kubelet_node_config_error|kube_daemonset_status_number_misscheduled|kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_limits:sum|container_memory_working_set_bytes|container_fs_reads_bytes_total|kube_node_status_condition|namespace_cpu:kube_pod_container_resource_requests:sum|kubelet_server_expiration_renew_errors|container_fs_writes_total|kube_horizontalpodautoscaler_status_desired_replicas|node_filesystem_avail_bytes|kube_pod_status_reason|node_filesystem_size_bytes|kube_deployment_spec_replicas|kube_statefulset_metadata_generation|namespace_workload_pod|storage_operation_duration_seconds_count|kubelet_certificate_manager_client_expiration_renew_errors|kube_pod_container_resource_limits|kube_statefulset_status_replicas_updated|node_namespace_pod_container:container_memory_rss|kube_statefulset_status_observed_generation|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|kubelet_pleg_relist_interval_seconds_bucket|kube_job_status_start_time|kube_deployment_status_observed_generation|kubelet_pod_worker_duration_seconds_bucket|container_memory_cache|kube_resourcequota|kube_horizontalpodautoscaler_spec_min_replicas|namespace_memory:kube_pod_container_resource_requests:sum|kube_persistentvolumeclaim_resource_requests_storage_bytes|kube_daemonset_status_number_available|kube_job_failed|storage_operation_errors_total|cluster:namespace:pod_memory:active:kube_pod_container_resource_limits|container_fs_writes_bytes_total|kube_statefulset_replicas|kube_replicaset_owner|container_network_receive_bytes_total|volume_manager_total_volumes|kube_horizontalpodautoscaler_spec_max_replicas|kube_daemonset_status_desired_number_scheduled|kube_pod_container_status_waiting_reason|process_cpu_seconds_total|kube_node_status_allocatable|kube_deployment_status_replicas_available|kube_daemonset_status_updated_number_scheduled|container_network_receive_packets_total|container_memory_rss|container_cpu_usage_seconds_total|kube_namespace_status_phase|cluster:namespace:pod_memory:active:kube_pod_container_resource_requests|kubelet_volume_stats_available_bytes|kube_deployment_status_replicas_updated|kubelet_running_container_count|kube_node_info|container_network_transmit_packets_dropped_total|kubelet_certificate_manager_client_ttl_seconds|kube_pod_owner|kubelet_volume_stats_inodes|kubelet_runtime_operations_total|container_cpu_cfs_throttled_periods_total|kubelet_cgroup_manager_duration_seconds_bucket|kubelet_running_pod_count|container_network_transmit_packets_total|kubelet_node_name|kube_daemonset_status_current_number_scheduled|kube_statefulset_status_replicas_ready|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|kubelet_volume_stats_capacity_bytes|kube_horizontalpodautoscaler_status_current_replicas|node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile|kube_node_spec_taint|kubelet_pleg_relist_duration_seconds_bucket|kube_pod_status_phase|container_cpu_cfs_periods_total|kube_deployment_metadata_generation|node_namespace_pod_container:container_memory_cache|kube_statefulset_status_current_revision|kubelet_pleg_relist_duration_seconds_count|container_fs_reads_total|kube_statefulset_status_update_revision|container_network_receive_packets_dropped_total|kube_pod_info|kubelet_running_pods|process_resident_memory_bytes|kubelet_pod_worker_duration_seconds_count|kubelet_pod_start_duration_seconds_count|kubelet_cgroup_manager_duration_seconds_count|kube_node_status_capacity|container_network_transmit_bytes_total|rest_client_requests_total|kubernetes_build_info|machine_memory_bytes|kube_statefulset_status_replicas|container_memory_swap|kube_job_status_active|kubelet_pod_start_duration_seconds_bucket|node_namespace_pod_container:container_memory_working_set_bytes|node_namespace_pod_container:container_memory_swap|kube_namespace_status_phase|container_cpu_usage_seconds_total|kube_pod_status_phase|kube_pod_start_time|kube_pod_container_status_restarts_total|kube_pod_container_info|kube_pod_container_status_waiting_reason|kube_daemonset.*|kube_replicaset.*|kube_statefulset.*|kube_job.*|kube_node.*|node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate|cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests|namespace_cpu:kube_pod_container_resource_requests:sum|node_cpu.*|node_memory.*|node_filesystem.*|node_network_transmit_bytes_total action: keep relabel_configs: - action: keep regex: prometheus-node-exporter.* source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name - action: replace source_labels: - __meta_kubernetes_pod_node_name target_label: instance - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt insecure_skip_verify: false - job_name: integrations/kubernetes/opencost kubernetes_sd_configs: - role: pod relabel_configs: - action: keep regex: opencost-* source_labels: - __meta_kubernetes_pod_label_app_kubernetes_io_name integrations: eventhandler: cache_path: /var/lib/agent/eventhandler.cache logs_instance: integrations logs: configs: - name: integrations clients: - url: LOGS_HOST/loki/api/v1/push basic_auth: username: LOGS_USERNAME password: LOGS_PASSWORD external_labels: cluster: CLUSTER_NAME job: integrations/kubernetes/eventhandler positions: filename: /tmp/positions.yaml target_config: sync_period: 10s
Deploy the file using
kubectl apply -f <file.yaml>
In the following command, replace
NAMESPACE
with the namespace for your Grafana Agent, and deploy the Agent by running the command:MANIFEST_URL=https://raw.githubusercontent.com/grafana/agent/v0.35.3/production/kubernetes/agent-bare.yaml NAMESPACE="NAMESPACE" /bin/sh -c "$(curl -fsSL https://raw.githubusercontent.com/grafana/agent/v0.35.3/production/kubernetes/install-bare.sh)" | kubectl apply -f -
3. Deploy kube-state-metrics
kube-state-metrics
is a service that listens to the Kubernetes API server, and generates metrics on the state of the objects. The metrics are exported on the HTTP endpoint /metrics
, on the listening port.
Replace NAMESPACE
with the namespace for your Grafana Agent in the following command, and run it to deploy kube-state-metrics
:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && helm repo update && helm install ksm prometheus-community/kube-state-metrics -n "NAMESPACE"
4. Deploy node_exporter for resource monitoring
Node Exporter is required for resource efficiency monitoring.
Replace NAMESPACE
with the namespace for your Grafana Agent in the following command, and run it to deploy Node Exporter:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && helm repo update && helm install nodeexporter prometheus-community/prometheus-node-exporter -n "NAMESPACE"
5. Deploy OpenCost for cost monitoring
OpenCost provides estimates for the Cost view.
Run the following command, replacing within the command the following:
NAMESPACE
with the namespace for the Grafana Agent for metricsCLUSTER_NAME
with the name of your clusterMETRICS_HOST
with the hostname for your Prometheus instanceMETRICS_USERNAME
with the username for your Prometheus instanceMETRICS_PASSWORD
with your Access Policy Token from a previous step
helm repo add opencost https://opencost.github.io/opencost-helm-chart && helm repo update && \
helm install opencost opencost/opencost -n "NAMESPACE" -f - <<EOF
fullnameOverride: opencost
opencost:
exporter:
defaultClusterId: "CLUSTER_NAME"
extraEnv:
CLOUD_PROVIDER_API_KEY: AIzaSyD29bGxmHAVEOBYtgd8sYM2gM2ekfxQX4U
EMIT_KSM_V1_METRICS: "false"
EMIT_KSM_V1_METRICS_ONLY: "true"
PROM_CLUSTER_ID_LABEL: cluster
prometheus:
password: "METRICS_PASSWORD"
username: "METRICS_USERNAME"
external:
enabled: true
url: "METRICS_HOST/api/prom"
internal: { enabled: false }
ui: { enabled: false }
EOF
6. Deploy Agent ConfigMap & DaemonSet for logs
The grafana-agent-logs ConfigMap configures the Grafana Agent DaemonSet to tail container logs and send them to Grafana Cloud.
Save the following to a file, and replace within the file the following:
NAMESPACE
with the namespace for the Grafana Agent for metricsCLUSTER_NAME
with the name of your clusterMETRICS_HOST
with the hostname for your Prometheus instanceMETRICS_USERNAME
with the username for your Prometheus instanceMETRICS_PASSWORD
with your Access Policy Token from earlierLOGS_HOST
with the hostname for your Loki instanceLOGS_USERNAME
with the username for your Loki instanceLOGS_PASSWORD
with your Access Policy Token from earlier
If your Kubernetes cluster does not use Docker, copy the following script to an editor and replace
docker: {}
withcri: {}
.kind: ConfigMap metadata: name: grafana-agent-logs namespace: NAMESPACE apiVersion: v1 data: agent.yaml: | metrics: wal_directory: /tmp/grafana-agent-wal global: scrape_interval: 60s external_labels: cluster: CLUSTER_NAME configs: - name: integrations remote_write: - url: METRICS_HOST/api/prom/push basic_auth: username: METRICS_USERNAME password: METRICS_PASSWORD integrations: prometheus_remote_write: - url: METRICS_HOST/api/prom/push basic_auth: username: METRICS_USERNAME password: METRICS_PASSWORD logs: configs: - name: integrations clients: - url: LOGS_HOST/loki/api/v1/push basic_auth: username: LOGS_USERNAME password: LOGS_PASSWORD external_labels: cluster: CLUSTER_NAME positions: filename: /tmp/positions.yaml target_config: sync_period: 10s scrape_configs: - job_name: integrations/kubernetes/pod-logs kubernetes_sd_configs: - role: pod pipeline_stages: - docker: {} relabel_configs: - source_labels: - __meta_kubernetes_pod_node_name target_label: __host__ - action: replace replacement: $1 separator: / source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_name target_label: job - action: replace source_labels: - __meta_kubernetes_namespace target_label: namespace - action: replace source_labels: - __meta_kubernetes_pod_name target_label: pod - action: replace source_labels: - __meta_kubernetes_pod_container_name target_label: container - replacement: /var/log/pods/*$1/*.log separator: / source_labels: - __meta_kubernetes_pod_uid - __meta_kubernetes_pod_container_name target_label: __path__
Deploy the file using
kubectl apply -f <file.yaml>
Replace
NAMESPACE
with the namespace for your Grafana Agent in the following command, and deploy the Agent by running it:MANIFEST_URL=https://raw.githubusercontent.com/grafana/agent/v0.35.3/production/kubernetes/agent-loki.yaml NAMESPACE="NAMESPACE" /bin/sh -c "$(curl -fsSL https://raw.githubusercontent.com/grafana/agent/v0.35.3/production/kubernetes/install-bare.sh)" | kubectl apply -f -
7. Done with setup
To finish up:
Navigate to the Kubernetes Monitoring home page to see that Grafana Agent has begun collecting your Kubernetes telemetry data.
Explore your Kubernetes infrastructure to view the monitoring data:
- Click Cluster in the menu, then click your namespace to view the grafana-agent StatefulSet, the grafana-agent-logs DaemonSet, and the ksm-kube-state-metrics deployment. Click the kube-system namespace to see Kubernetes-specific resources.
- Click Nodes in the menu, then click the nodes of your cluster to view their condition, utilization, and pod density.
Next steps
Refer to Monitor an app with Kubernetes Monitoring tutorial to learn how to:
- Deploy an instrumented three-tier (data layer, app logic layer, load-balancing layer) web application into a Kubernetes cluster.
- Use Kubernetes Monitoring to monitor the application.
Related resources from Grafana Labs


