Manage your configuration
Kubernetes Monitoring gathers metrics, logs, and events, and calculates costs for your infrastructure. It also provides recording rules, alerting rules, and allowlists.
Metrics control and management
The following are ways to control and manage metrics:
- Reduce usage
- Identify unnecessary or duplicate metrics
- Analyze usage
- Use allowlists
Reduce usage
The best way to control and manage your metrics is to use the techniques detailed in Reduce Kubernetes Metrics usage.
Identify unnecessary or duplicate metrics
To identify unnecessary or duplicate metrics that could come from within your cluster, you can analyze:
- The distribution of your metrics and labels with cardinality management dashboards.
- Current metrics usage and associated costs from the billing and usage dashboard located in your Grafana instance.
Analyze usage
Refer to Analyzing metrics usage with Grafana Explore for more techniques.
Use allowlists
By default, Kubernetes Monitoring configures allowlists using Prometheus relabel_config blocks. To learn more about metric_relabel_configs
, refer to Reduce Prometheus metrics usage with relabeling.
These allowlists trim metrics collected to a useful set. To omit or modify the allowlists, modify the corresponding metric_relabel_configs
blocks in your Agent configuration. To learn more about analyzing and controlling active series usage, refer to Control Prometheus metrics usage.
Grafana Cloud billing is based on billable series. To learn more about the pricing model, refer to Active series and DPM.
Default active series usage varies depending on your Kubernetes cluster size (number of Nodes) and running workloads (number of Pods, containers, Deployments, etc.).
When testing on a Cloud provider’s Kubernetes offering, the following active series usage was observed:
- 3-Node cluster, 17 running Pods, 31 running containers: 3.8k active series
- The only Pods deployed into the cluster were Grafana Agent and kube-state-metrics. The rest were running in the
kube-system
Namespace and managed by the cloud provider
- The only Pods deployed into the cluster were Grafana Agent and kube-state-metrics. The rest were running in the
- From this baseline, active series usage roughly increased by:
- 1000 active series per additional Node
- 75 active series per additional Pod (vanilla Nginx Pods were deployed into the cluster)
These are very rough guidelines and results may vary depending on your Cloud provider or Kubernetes version. Note also that these figures are based on the scrape targets configured above, and not additional targets such as application metrics, API server metrics, and scheduler metrics.
Logs management
To analyze, customize, and deduplicate logs, refer to Logs in Explore.
How Kubernetes Monitoring works
We use kube-state-metrics to generate metrics from Kubernetes objects without modification, and send these metrics to Grafana Cloud. We use Loki to collect logs from the Kubernetes objects and send the logs to Grafana Cloud.
We are heavily indebted to the open source kubernetes-mixin project, from which the dashboards, recording rules, and alerting rules have been derived. We will continue to contribute bug fixes and new features upstream.
Metrics
Kubernetes Monitoring scrapes the following items to provide metrics:
- cAdvisor: Running Daemon that provides information about running containers. Provides metrics on container resource usage (CPU, memory, disk).
- kubelet: Primary “node agent” that runs on each Node in the cluster and ensures containers are running. Provides metrics on Pods and their containers.
- kube-state-metrics: Service that generates metrics from Kubernetes objects without modification. Provides metrics on the state of objects in your cluster (Pods, Deployments, DaemonSets). Required for the cluster navigation feature.
- node-exporter: Prometheus exporter. Gathers gathering hardware and OS metrics for Linux Nodes in the cluster.
kube-state-metrics
The following metrics are required to use the Kubernetes Monitoring cluster navigation feature:
- kube_namespace_status_phase
- container_cpu_usage_seconds_total
- kube_pod_status_phase
- kube_pod_start_time
- kube_pod_container_status_restarts_total
- kube_pod_container_info
- kube_pod_container_status_waiting_reason
- kube_daemonset.\*
- kube_replicaset.\*
- kube_statefulset.\*
- kube_job.\*
- kube_node*
- kube_cluster*
- node_cpu_seconds_total
- node_memory_MemAvailable_bytes
- node_filesystem_size_bytes
- node_namespace_pod_container
- container_memory_working_set_bytes
- job="integrations/kubernetes/eventhandler" (for event logs, comes default with Grafana agent)
Note: Logs are not required for Kubernetes Monitoring to work, but they provide additional context in some views of the Cluster navigation tab. Log entries must be sent to a Loki data source withcluster
,namespace
, andpod
labels.
Logs
Kubernetes Monitoring uses Agent Flow mode to collect logs from all Pods running in your cluster and send them to Loki in Grafana Cloud.
Events
Kubernetes events provide helpful logging information emitted by Kubernetes cluster controllers. Agent Flow mode contains an embedded integration that watches for event objects in your clusters, and sends them to Grafana Cloud for long-term storage and analysis.
An Eventhandler deployed by Kubernetes Monitoring watches for Kubernetes events in your clusters.
Cost calculations
Kubernetes Monitoring uses OpenCost and Grafana’s experience in managing costs related to Kubernetes. For more details, refer to Manage costs.
Recording rules
Kubernetes Monitoring includes the following recording rules to speed up dashboard queries and alerting rule evaluation:
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
node_namespace_pod_container:container_memory_working_set_bytes
node_namespace_pod_container:container_memory_rss
node_namespace_pod_container:container_memory_cache
node_namespace_pod_container:container_memory_swap
cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
namespace_memory:kube_pod_container_resource_requests:sum
cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
namespace_cpu:kube_pod_container_resource_requests:sum
cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
namespace_memory:kube_pod_container_resource_limits:sum
cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
namespace_cpu:kube_pod_container_resource_limits:sum
namespace_workload_pod:kube_pod_owner:relabel
namespace_workload_pod:kube_pod_owner:relabel
namespace_workload_pod:kube_pod_owner:relabel
Note: Recording rules may emit time series with the same metric name, but different labels. To modify these programmatically, refer to Set up Alerting for Cloud.
Alerting rules
Kubernetes Monitoring comes with alerting rules to alert on conditions, such as “Pods crash looping and Pods getting stuck in not ready”. The following alerting rules are pre-configured. You will be notified when issues arise with your clusters and their workloads.
Kubelet alerts
KubeNodeNotReady
KubeNodeUnreachable
KubeletTooManyPods
KubeNodeReadinessFlapping
KubeletPlegDurationHigh
KubeletPodStartUpLatencyHigh
KubeletClientCertificateExpiration
KubeletClientCertificateExpiration
KubeletServerCertificateExpiration
KubeletServerCertificateExpiration
KubeletClientCertificateRenewalErrors
KubeletServerCertificateRenewalErrors
KubeletDown
Kubernetes system alerts
KubeVersionMismatch
KubeClientErrors
Kubernetes resource usage alerts
KubeCPUOvercommit
KubeMemoryOvercommit
KubeCPUQuotaOvercommit
KubeMemoryQuotaOvercommit
KubeQuotaAlmostFull
KubeQuotaFullyUsed
KubeQuotaExceeded
CPUThrottlingHigh
Kubernetes alerts
KubePodCrashLooping
KubePodNotRead
KubeDeploymentGenerationMismatch
KubeDeploymentReplicasMismatch
KubeStatefulSetReplicasMismatch
KubeStatefulSetGenerationMismatch
KubeStatefulSetUpdateNotRolledOut
KubeDaemonSetRolloutStuck
KubeContainerWaiting
KubeDaemonSetNotScheduled
KubeDaemonSetMisScheduled
KubeJobCompletion
KubeJobFailed
KubeHpaReplicasMismatch
KubeHpaMaxedOut
To learn more, refer to the upstream Kubernetes-Mixin’s Kubernetes Alert Runbooks page. You can update programmatically the alerting rule links to point your own runbooks in these preconfigured alerts, using a tool like cortex-tools or grizzly.
Get support
To open a support ticket, navigate to your Grafana Cloud Portal, and click Open a Support Ticket.
Related resources from Grafana Labs


