Menu
Grafana Cloud Grafana Kubernetes Monitoring About Kubernetes Monitoring

About Kubernetes Monitoring

Grafana Kubernetes Monitoring lets you view all of your Kubernetes data in one place. By shipping kube-state-metrics to Grafana Cloud, you can inspect the health of your clusters, containers, and pods with little or no required configuration. You can also access preconfigured dashboards, alert rules, and recording rules.

Get started

Select a question to learn more about Kubernetes Monitoring.

How do I use Kubernetes Monitoring?

With Kubernetes Monitoring, you can explore your infrastructure by navigating through the object model. For a tour, see Navigate Kubernetes Monitoring.

What’s the easiest way to get started?

To get started monitoring Kubernetes quickly, use the Kubernetes Monitoring interface. See Configure Kubernetes Monitoring to get started.

Note: You should have only one job scraping kube-state-metrics. If you have multiple scrape jobs running at the same time, you might see an error similar to the following when you try to view objects in Cluster navigation: execution: found duplicate series for the match group...

What data is visible to me with Kubernetes Monitoring?

Kubernetes Monitoring provides logs and metrics related to your Kubernetes infrastructure and resource usage. We use kube-state-metrics to generate metrics from Kubernetes objects without modification. We use Loki to collect logs from the Kubernetes objects.

Specifically, Kubernetes Monitoring give you access to the following:

ComponentDescription
ManifestsPreconfigured manifests for deploying Grafana Agent, Grafana’s telemetry collector, and kube-state-metrics to your clusters. See kube-state-metrics to learn which kube-state-metrics are scraped by default with Kubernetes Monitoring.
DashboardsNine Grafana dashboards to drill into resource usage and cluster operations, from the multi-cluster level down to individual containers and Pods.
Recording rulesA set of recording rules to speed up dashboard queries.
Alerting rulesA set of alerting rules to alert on conditions. For example: Pods crash looping and Pods getting stuck in “not ready” status.
AllowlistAn optional preconfigured allowlist of metrics referenced in the above dashboards, recording rules, and alerting rules to reduce your active series usage while still giving you visibility into core cluster metrics.
EventsGrafana Agent can configure an eventhandler integration to watch for Kubernetes Events in your clusters.

We are heavily indebted to the open source kubernetes-mixin project, from which the dashboards, recording rules, and alerting rules have been derived. We will continue to contribute bug fixes and new features upstream.

kube-state-metrics

The following metrics are required to use the Kubernetes Monitoring Cluster navigation feature:

- kube_namespace_status_phase
- container_cpu_usage_seconds_total
- kube_pod_status_phase
- kube_pod_start_time
- kube_pod_container_status_restarts_total
- kube_pod_container_info
- kube_pod_container_status_waiting_reason
- kube_daemonset.\*
- kube_replicaset.\*
- kube_statefulset.\*
- kube_job.\*
- kube_node*
- kube_cluster*
- node_cpu_seconds_total
- node_memory_MemAvailable_bytes
- node_filesystem_size_bytes
- node_namespace_pod_container
- container_memory_working_set_bytes
- job="integrations/kubernetes/eventhandler" (for event logs, comes default with Grafana agent)

NOTE: Logs are not required for Kubernetes Monitoring to work, but they provide additional context in some views of the Cluster Navigation tab. Log entries must be shipped to a Loki data source with cluster, namespace, and pod labels.

Alerting Rules

The following alerting rules are preconfigured to help you get up and running with Grafana Cloud alerts. You will be notified when issues arise with your clusters and their workloads.

Kubelet alerts

  • KubeNodeNotReady
  • KubeNodeUnreachable
  • KubeletTooManyPods
  • KubeNodeReadinessFlapping
  • KubeletPlegDurationHigh
  • KubeletPodStartUpLatencyHigh
  • KubeletClientCertificateExpiration
  • KubeletClientCertificateExpiration
  • KubeletServerCertificateExpiration
  • KubeletServerCertificateExpiration
  • KubeletClientCertificateRenewalErrors
  • KubeletServerCertificateRenewalErrors
  • KubeletDown

Kubernetes system alerts

  • KubeVersionMismatch
  • KubeClientErrors

Kubernetes resource usage alerts

  • KubeCPUOvercommit
  • KubeMemoryOvercommit
  • KubeCPUQuotaOvercommit
  • KubeMemoryQuotaOvercommit
  • KubeQuotaAlmostFull
  • KubeQuotaFullyUsed
  • KubeQuotaExceeded
  • CPUThrottlingHigh

Kubernetes alerts

  • KubePodCrashLooping
  • KubePodNotRead
  • KubeDeploymentGenerationMismatch
  • KubeDeploymentReplicasMismatch
  • KubeStatefulSetReplicasMismatch
  • KubeStatefulSetGenerationMismatch
  • KubeStatefulSetUpdateNotRolledOut
  • KubeDaemonSetRolloutStuck
  • KubeContainerWaiting
  • KubeDaemonSetNotScheduled
  • KubeDaemonSetMisScheduled
  • KubeJobCompletion
  • KubeJobFailed
  • KubeHpaReplicasMismatch
  • KubeHpaMaxedOut

To learn more, see the upstream Kubernetes-Mixin’s Kubernetes Alert Runbooks page. You can update alerting rule links to point your own runbooks in these pre-configured alerts programmatically, using a tool like cortex-tools or grizzly. To learn more, see Prometheus and Loki rules with mimirtool and Alerts.

Recording Rules

Kubernetes Monitoring includes the following recording rules to speed up dashboard queries and alerting rule evaluation:

  • node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
  • node_namespace_pod_container:container_memory_working_set_bytes
  • node_namespace_pod_container:container_memory_rss
  • node_namespace_pod_container:container_memory_cache
  • node_namespace_pod_container:container_memory_swap
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
  • namespace_memory:kube_pod_container_resource_requests:sum
  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
  • namespace_cpu:kube_pod_container_resource_requests:sum
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
  • namespace_memory:kube_pod_container_resource_limits:sum
  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
  • namespace_cpu:kube_pod_container_resource_limits:sum
  • namespace_workload_pod:kube_pod_owner:relabel
  • namespace_workload_pod:kube_pod_owner:relabel
  • namespace_workload_pod:kube_pod_owner:relabel

Note that recording rules may emit time series with the same metric name, but different labels.

To learn how to modify these programmatically, see Prometheus and Loki rules with mimirtool.

Logs

The default Kubernetes Monitoring setup instructions roll out a Grafana Agent DaemonSet to collect logs from all pods running in your cluster and ship these to Grafana Cloud Loki.

If you want to collect just logs, follow the Ship Kubernetes logs using Grafana Agent guide.

Traces

Kubernetes Monitoring does not yet support out-of-the-box configuration for shipping traces to your hosted Tempo endpoint. However, you can get started shipping traces to Grafana Cloud by following the Ship Kubernetes traces using Grafana Agent guide. This will roll out a single-replica Agent Deployment that will receive Traces and remote_write these to Grafana Cloud.

How can I manage and track metrics and logs coming through Kubernetes Monitoring?

The best way to start controlling and managing your metrics is with the Reducing Kubernetes Metrics usage guide.

You can also analyze the distribution of your metrics and labels with cardinality management dashboards (included with Pro and Advanced) and analyze your current metrics usage and associated costs from the billing and usage dashboard located in your Grafana instance. These will help pinpoint unnecessary or duplicate metrics that may be coming from within your cluster. Another helpful resource is the Analyzing metrics usage with Grafana Explore guide. For logs, see the Explore for Logs guide.

Allowlists for managing metrics

Another method for managing metrics is to use allowlists. By default, Kubernetes Monitoring configures allowlists using Prometheus relabel_config blocks. To learn more about relabel_configs, metric_relabel_configs and write_relabel_configs, see Reducing Prometheus metrics usage with relabeling.

These allowlists drop any metrics not referenced in the dashboards, rules, and alerts. To omit or modify the allowlists, modify the corresponding metric_relabel_configs blocks in your Agent configuration. To learn more about analyzing and controlling active series usage, see Control Prometheus metrics usage.

Grafana Cloud billing is based on billable series. To learn more about the pricing model, see Active series and DPM.

Default active series usage varies depending on your Kubernetes cluster size (number of nodes) and running workloads (number of Pods, containers, Deployments, etc.).

When testing on a Cloud provider’s Kubernetes offering, the following active series usage was observed:

  • 3 node cluster, 17 running pods, 31 running containers: 3.8k active series
    • The only Pods deployed into the cluster were Grafana Agent and kube-state-metrics. The rest were running in the kube-system Namespace and managed by the cloud provider
  • From this baseline, active series usage roughly increased by:
    • 1000 active series per additional node
    • 75 active series per additional pod (vanilla Nginx Pods were deployed into the cluster)

These are very rough guidelines and results may vary depending on your Cloud provider or Kubernetes version. Note also that these figures are based on the scrape targets configured above, and not additional targets like application metrics, API server metrics, and scheduler metrics.

I’m already sending kube-state-metrics to Prometheus - do I still need to use Grafana Agent?

You do not need to use Grafana Agent if you are already sending kube-state-metrics. Kubernetes Monitoring shows data in the Cluster navigation tab as long as you are using kube-state metrics version 2.1 or greater. However, some features (like pod logs, Kubernetes events, and resource management) rely on specific recording rules and metrics that ship with Grafana Agent, and may not show properly. To switch to Grafana agent, make sure to first remove the existing job that is sending metrics.

How can I migrate to Kubernetes Monitoring from an older integration?

If you are already sending kube-state-metrics to Grafana Cloud with Grafana Agent through the Kubernetes integration, no action is needed—Kubernetes Monitoring uses the existing metrics and you can start using the Kubernetes Monitoring preconfigured dashboards and alerting immediately.

If it has been a while since you deployed Grafana Agent into your cluster, you might want to redeploy by following the instructions in the Configuration section of Kubernetes Monitoring. See how to configure Kubernetes Monitoring using Grafana Agent.

Does Grafana Cloud support integrations on Kubernetes?

Grafana Cloud does not currently support integrations on Kubernetes as a platform, like the Linux Node integration (node-exporter), Redis integration, MySQL integration, and others. Until this support is available, use embedded Agent exporters and integrations by configuring them manually. To learn how, see integrations_config.

Node-exporter metrics

For node-exporter or host system metrics, you can roll out the node-exporter Helm Chart and add the following Agent scrape config job:

              . . .
              - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                job_name: integrations/node-exporter
                kubernetes_sd_configs:
                  - namespaces:
                        names:
                          - NODE_EXPORTER_NAMESPACE_HERE
                    role: pod
                relabel_configs:
                  - action: keep
                    regex: prometheus-node-exporter.*
                    source_labels:
                      - __meta_kubernetes_pod_label_app
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_pod_node_name
                    target_label: instance
                  - action: replace
                    source_labels:
                      - __meta_kubernetes_namespace
                    target_label: namespace
                tls_config:
                    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
                    insecure_skip_verify: false

This will instruct Agent to scrape any Pod with the label app=prometheus-node-exporter.* (the value is a regular expression). The Helm chart configures this label by default, but if you modify the Chart’s values.yaml file or any other set defaults, you may have to adjust this scrape job accordingly. To learn more, see this set of examples.

Can I correlate data across metrics, logs, and traces?

Prometheus and Grafana Loki’s shared metadata keeps the same labels for your Kubernetes cluster, so you can access correlated Kubernetes metrics and logs.

Documentation for configuring correlation across metrics, logs and traces, specifically for Kubernetes workloads is forthcoming. In the interim period, see Intro to monitoring Kubernetes with Grafana Cloud. Note that this video was published prior to the release of Kubernetes Monitoring, so some concepts may differ slightly.

Can I monitor Kubernetes events?

Kubernetes events provide helpful logging information emitted by K8s cluster controllers. Grafana Agent contains an embedded integration that watches for event objects in your clusters, and ships them to Grafana Cloud for long-term storage and analysis. To enable this beta feature, see the Set up Kubernetes event monitoring guide. The setup instructions enable this feature by default in the Grafana Agent StatefulSet.

Where can I get help with installing Grafana Agent Operator?

If you need help installing Grafana Agent Operator, open a Support ticket by visiting your Grafana Cloud Portal and clicking Open a Support Ticket.

Where can I get help with the Kubernetes Monitoring?

You can open a Support ticket by visiting your Grafana Cloud Portal and clicking Open a Support Ticket.