Menu

Kubernetes integration for Grafana Cloud

Kubernetes is an open-source container orchestration system that automates software container deployment, scaling, and management. The Kubernetes integration allows you to monitor and alert on resource usage and cluster operations.

This integration includes:

  • Dashboards to visualize individual and aggregate resource usage for containers, Pods, K8s workloads, and more.
  • Dashboards to visualize Kubelet metrics.
  • A set of default alerting rules to monitor core cluster metrics.
  • A set of default recording rules to cache frequent queries and improve dashboard performance.
  • Pre-configured Agent manifests to scrape cAdvisor, kubelet, and kube-state-metrics endpoints and ship these to Grafana Cloud.
  • Pre-configured Agent manifests to tail container logs and Kubernetes events and ship these to Grafana Cloud.

Install Kubernetes integration for Grafana Cloud

  1. In your Grafana Cloud instance, click Integrations and Connections (lightning bolt icon).
  2. Navigate to the Kubernetes Monitoring tile and review the prerequisites. Then click Install integration.
  3. Once the integration is installed, follow the steps on the Configuration Details page to setup Grafana Agent and start sending Kubernetes metrics to your Grafana Cloud instance.

Dashboards

The Kubernetes integration installs the following dashboards in your Grafana Cloud instance to help monitor your metrics.

  • (Home) Kubernetes Integration
  • Kubernetes / Compute Resources / Multi-Cluster
  • Kubernetes / Compute Resources / Cluster
  • Kubernetes / Compute Resources / Namespace (Pods)
  • Kubernetes / Compute Resources / Namespace (Workloads)
  • Kubernetes / Compute Resources / Node (Pods)
  • Kubernetes / Compute Resources / Pod
  • Kubernetes / Compute Resources / Workload
  • Kubernetes / Kubelet
  • Kubernetes / Persistent Volumes

Home

image

Multi Cluster Dashboard

image

Cluster Dashboard

image

Pods by Namespace Dashboard

image

Workloads by Namespace Dashboard

image

Node Dashboard

image

Pod Dashboard

image

Workload Dashboard

image

Kubelet Dashboard

image

Persistent Disk Dashboard

image

Alerts

The Kubernetes integration includes the following useful alerts:

Group: kubernetes-apps

AlertDescription
KubePodCrashLoopingWarning: Pod is crash looping.
KubePodNotReadyWarning: Pod has been in a non-ready state for more than 15 minutes.
KubeDeploymentGenerationMismatchWarning: Deployment generation mismatch due to possible roll-back.
KubeDeploymentReplicasMismatchWarning: Deployment has not matched the expected number of replicas.
KubeStatefulSetReplicasMismatchWarning: Deployment has not matched the expected number of replicas.
KubeStatefulSetGenerationMismatchWarning: StatefulSet generation mismatch due to possible roll-back.
KubeStatefulSetUpdateNotRolledOutWarning: StatefulSet update has not been rolled out.
KubeDaemonSetRolloutStuckWarning: DaemonSet rollout is stuck.
KubeContainerWaitingWarning: Pod container waiting longer than 1 hour.
KubeDaemonSetNotScheduledWarning: DaemonSet pods are not scheduled.
KubeDaemonSetMisScheduledWarning: DaemonSet pods are misscheduled.
KubeJobNotCompletedWarning: Job did not complete in time.
KubeJobFailedWarning: Job failed to complete.
KubeHpaReplicasMismatchWarning: HPA has not matched desired number of replicas.
KubeHpaMaxedOutWarning: HPA is running at max replicas.

Group: kubernetes-resources

AlertDescription
KubeCPUOvercommitWarning: Cluster has overcommitted CPU resource requests.
KubeMemoryOvercommitWarning: Cluster has overcommitted memory resource requests.
KubeCPUQuotaOvercommitWarning: Cluster has overcommitted CPU resource requests.
KubeMemoryQuotaOvercommitWarning: Cluster has overcommitted memory resource requests.
KubeQuotaAlmostFullInfo: Namespace quota is going to be full.
KubeQuotaFullyUsedInfo: Namespace quota is fully used.
KubeQuotaExceededWarning: Namespace quota has exceeded the limits.
CPUThrottlingHighInfo: Processes experience elevated CPU throttling.

Group: kubernetes-system

AlertDescription
KubeVersionMismatchWarning: Different semantic versions of Kubernetes components running.
KubeClientErrorsWarning: Kubernetes API server client is experiencing errors.

Group: kubernetes-system-kubelet

AlertDescription
KubeNodeNotReadyWarning: Node is not ready.
KubeNodeUnreachableWarning: Node is unreachable.
KubeletTooManyPodsInfo: Kubelet is running at capacity.
KubeNodeReadinessFlappingWarning: Node readiness status is flapping.
KubeletPlegDurationHighWarning: Kubelet Pod Lifecycle Event Generator is taking too long to relist.
KubeletPodStartUpLatencyHighWarning: Kubelet Pod startup latency is too high.
KubeletClientCertificateExpirationWarning: Kubelet client certificate is about to expire.
KubeletClientCertificateExpirationCritical: Kubelet client certificate is about to expire.
KubeletServerCertificateExpirationWarning: Kubelet server certificate is about to expire.
KubeletServerCertificateExpirationCritical: Kubelet server certificate is about to expire.
KubeletClientCertificateRenewalErrorsWarning: Kubelet has failed to renew its client certificate.
KubeletServerCertificateRenewalErrorsWarning: Kubelet has failed to renew its server certificate.
KubeletDownCritical: Target disappeared from Prometheus target discovery.

Metrics

The following metrics are automatically written to your Grafana Cloud instance by connecting your Kubernetes instance through this integration:

  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
  • cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
  • cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
  • container_cpu_cfs_periods_total
  • container_cpu_cfs_throttled_periods_total
  • container_cpu_usage_seconds_total
  • container_fs_reads_bytes_total
  • container_fs_reads_total
  • container_fs_writes_bytes_total
  • container_fs_writes_total
  • container_memory_cache
  • container_memory_rss
  • container_memory_swap
  • container_memory_working_set_bytes
  • container_network_receive_bytes_total
  • container_network_receive_packets_dropped_total
  • container_network_receive_packets_total
  • container_network_transmit_bytes_total
  • container_network_transmit_packets_dropped_total
  • container_network_transmit_packets_total
  • go_goroutines
  • kube_daemonset_status_current_number_scheduled
  • kube_daemonset_status_desired_number_scheduled
  • kube_daemonset_status_number_available
  • kube_daemonset_status_number_misscheduled
  • kube_daemonset_status_updated_number_scheduled
  • kube_deployment_metadata_generation
  • kube_deployment_spec_replicas
  • kube_deployment_status_observed_generation
  • kube_deployment_status_replicas_available
  • kube_deployment_status_replicas_updated
  • kube_horizontalpodautoscaler_spec_max_replicas
  • kube_horizontalpodautoscaler_spec_min_replicas
  • kube_horizontalpodautoscaler_status_current_replicas
  • kube_horizontalpodautoscaler_status_desired_replicas
  • kube_job_failed
  • kube_job_status_active
  • kube_job_status_start_time
  • kube_namespace_status_phase
  • kube_node_info
  • kube_node_spec_taint
  • kube_node_status_allocatable
  • kube_node_status_capacity
  • kube_node_status_condition
  • kube_pod_container_resource_limits
  • kube_pod_container_resource_requests
  • kube_pod_container_status_waiting_reason
  • kube_pod_info
  • kube_pod_owner
  • kube_pod_status_phase
  • kube_replicaset_owner
  • kube_resourcequota
  • kube_statefulset_metadata_generation
  • kube_statefulset_replicas
  • kube_statefulset_status_current_revision
  • kube_statefulset_status_observed_generation
  • kube_statefulset_status_replicas
  • kube_statefulset_status_replicas_ready
  • kube_statefulset_status_replicas_updated
  • kube_statefulset_status_update_revision
  • kubelet_certificate_manager_client_expiration_renew_errors
  • kubelet_certificate_manager_client_ttl_seconds
  • kubelet_certificate_manager_server_ttl_seconds
  • kubelet_cgroup_manager_duration_seconds_bucket
  • kubelet_cgroup_manager_duration_seconds_count
  • kubelet_node_config_error
  • kubelet_node_name
  • kubelet_pleg_relist_duration_seconds_bucket
  • kubelet_pleg_relist_duration_seconds_count
  • kubelet_pleg_relist_interval_seconds_bucket
  • kubelet_pod_start_duration_seconds_bucket
  • kubelet_pod_start_duration_seconds_count
  • kubelet_pod_worker_duration_seconds_bucket
  • kubelet_pod_worker_duration_seconds_count
  • kubelet_running_container_count
  • kubelet_running_containers
  • kubelet_running_pod_count
  • kubelet_running_pods
  • kubelet_runtime_operations_errors_total
  • kubelet_runtime_operations_total
  • kubelet_server_expiration_renew_errors
  • kubelet_volume_stats_available_bytes
  • kubelet_volume_stats_capacity_bytes
  • kubelet_volume_stats_inodes
  • kubelet_volume_stats_inodes_used
  • kubernetes_build_info
  • machine_memory_bytes
  • namespace_cpu:kube_pod_container_resource_limits:sum
  • namespace_cpu:kube_pod_container_resource_requests:sum
  • namespace_memory:kube_pod_container_resource_limits:sum
  • namespace_memory:kube_pod_container_resource_requests:sum
  • namespace_workload_pod
  • namespace_workload_pod:kube_pod_owner:relabel
  • node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
  • node_namespace_pod_container:container_memory_cache
  • node_namespace_pod_container:container_memory_rss
  • node_namespace_pod_container:container_memory_swap
  • node_namespace_pod_container:container_memory_working_set_bytes
  • node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile
  • process_cpu_seconds_total
  • process_resident_memory_bytes
  • rest_client_requests_total
  • storage_operation_duration_seconds_count
  • storage_operation_errors_total
  • volume_manager_total_volumes

Changelog

# 0.0.8 - September 2022

* Use a more efficient query for homepage 'Configuration status' panel
* Remove opinionated regex for log datasource name, allowing easier use outside of Grafana Cloud
* Update upstream K8s mixin: [1ddc6f6f739cc9fe4b8ac6a0fbb23cb09fe53bc3](https://github.com/kubernetes-monitoring/kubernetes-mixin/commit/1ddc6f6f739cc9fe4b8ac6a0fbb23cb09fe53bc3)
* Fix storage queries and overview panels [kubernetes-monitoring/kubernetes-mixin#789](https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/789)

# 0.0.7 - July 2022

* Update upstream K8s mixin: [b8f44bb7be728423836bef0e904ec7166895a34b](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/b8f44bb7be728423836bef0e904ec7166895a34b)
* Add cluster label to aggregations in more alert queries

# 0.0.6 - May 2022

* Update to upstream k8s mixin - Commit [62ad10fe9ceb53c6b846871997abbfe8e0bd7cf5](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/62ad10fe9ceb53c6b846871997abbfe8e0bd7cf5)
* Add cluster label to aggregations in alert queries

# 0.0.5 - February 2022

* Add Grafana Agent experimental event logging and annotations
* Update to upstream k8s mixin - Commit [177bc8ec789fa049a9585713d232035b159f8c92](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/177bc8ec789fa049a9585713d232035b159f8c92)

# 0.0.4 - January 2022

* CPU utilization as CPU seconds, rather than percent on cluster, and multicluster dashboards

# 0.0.3 - December 2021

* Reorder template variables to logically represent broadest to narrowest from left to right

# 0.0.2 - October 2021

* Update to upstream Kubernetes mixin
* Fix 404 on links between dashboards
* Add a top-level dashboard/homepage
* Add allowlisting of only metrics used by dashboards, alerts, and rules to agent config

# 0.0.1 - April 2021

* Initial release

Cost

By connecting your Kubernetes instance to Grafana Cloud you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.