Kubernetes integration for Grafana Cloud
Kubernetes is an open-source container orchestration system that automates software container deployment, scaling, and management. The Kubernetes integration allows you to monitor and alert on resource usage and cluster operations.
This integration includes:
- Dashboards to visualize individual and aggregate resource usage for containers, Pods, K8s workloads, and more.
- Dashboards to visualize Kubelet metrics.
- A set of default alerting rules to monitor core cluster metrics.
- A set of default recording rules to cache frequent queries and improve dashboard performance.
- Pre-configured Agent manifests to scrape cAdvisor, kubelet, and kube-state-metrics endpoints and ship these to Grafana Cloud.
- Pre-configured Agent manifests to tail container logs and Kubernetes events and ship these to Grafana Cloud.
Install Kubernetes integration for Grafana Cloud
- In your Grafana Cloud instance, click Integrations and Connections (lightning bolt icon).
- Navigate to the Kubernetes Monitoring tile and review the prerequisites. Then click Install integration.
- Once the integration is installed, follow the steps on the Configuration Details page to setup Grafana Agent and start sending Kubernetes metrics to your Grafana Cloud instance.
Dashboards
The Kubernetes integration installs the following dashboards in your Grafana Cloud instance to help monitor your metrics.
- (Home) Kubernetes Integration
- Kubernetes / Compute Resources / Multi-Cluster
- Kubernetes / Compute Resources / Cluster
- Kubernetes / Compute Resources / Namespace (Pods)
- Kubernetes / Compute Resources / Namespace (Workloads)
- Kubernetes / Compute Resources / Node (Pods)
- Kubernetes / Compute Resources / Pod
- Kubernetes / Compute Resources / Workload
- Kubernetes / Kubelet
- Kubernetes / Persistent Volumes
Home
Multi Cluster Dashboard
Cluster Dashboard
Pods by Namespace Dashboard
Workloads by Namespace Dashboard
Node Dashboard
Pod Dashboard
Workload Dashboard
Kubelet Dashboard
Persistent Disk Dashboard
Alerts
The Kubernetes integration includes the following useful alerts:
Group: kubernetes-apps
Alert | Description |
---|---|
KubePodCrashLooping | Warning: Pod is crash looping. |
KubePodNotReady | Warning: Pod has been in a non-ready state for more than 15 minutes. |
KubeDeploymentGenerationMismatch | Warning: Deployment generation mismatch due to possible roll-back. |
KubeDeploymentReplicasMismatch | Warning: Deployment has not matched the expected number of replicas. |
KubeStatefulSetReplicasMismatch | Warning: Deployment has not matched the expected number of replicas. |
KubeStatefulSetGenerationMismatch | Warning: StatefulSet generation mismatch due to possible roll-back. |
KubeStatefulSetUpdateNotRolledOut | Warning: StatefulSet update has not been rolled out. |
KubeDaemonSetRolloutStuck | Warning: DaemonSet rollout is stuck. |
KubeContainerWaiting | Warning: Pod container waiting longer than 1 hour. |
KubeDaemonSetNotScheduled | Warning: DaemonSet pods are not scheduled. |
KubeDaemonSetMisScheduled | Warning: DaemonSet pods are misscheduled. |
KubeJobNotCompleted | Warning: Job did not complete in time. |
KubeJobFailed | Warning: Job failed to complete. |
KubeHpaReplicasMismatch | Warning: HPA has not matched desired number of replicas. |
KubeHpaMaxedOut | Warning: HPA is running at max replicas. |
Group: kubernetes-resources
Alert | Description |
---|---|
KubeCPUOvercommit | Warning: Cluster has overcommitted CPU resource requests. |
KubeMemoryOvercommit | Warning: Cluster has overcommitted memory resource requests. |
KubeCPUQuotaOvercommit | Warning: Cluster has overcommitted CPU resource requests. |
KubeMemoryQuotaOvercommit | Warning: Cluster has overcommitted memory resource requests. |
KubeQuotaAlmostFull | Info: Namespace quota is going to be full. |
KubeQuotaFullyUsed | Info: Namespace quota is fully used. |
KubeQuotaExceeded | Warning: Namespace quota has exceeded the limits. |
CPUThrottlingHigh | Info: Processes experience elevated CPU throttling. |
Group: kubernetes-system
Alert | Description |
---|---|
KubeVersionMismatch | Warning: Different semantic versions of Kubernetes components running. |
KubeClientErrors | Warning: Kubernetes API server client is experiencing errors. |
Group: kubernetes-system-kubelet
Alert | Description |
---|---|
KubeNodeNotReady | Warning: Node is not ready. |
KubeNodeUnreachable | Warning: Node is unreachable. |
KubeletTooManyPods | Info: Kubelet is running at capacity. |
KubeNodeReadinessFlapping | Warning: Node readiness status is flapping. |
KubeletPlegDurationHigh | Warning: Kubelet Pod Lifecycle Event Generator is taking too long to relist. |
KubeletPodStartUpLatencyHigh | Warning: Kubelet Pod startup latency is too high. |
KubeletClientCertificateExpiration | Warning: Kubelet client certificate is about to expire. |
KubeletClientCertificateExpiration | Critical: Kubelet client certificate is about to expire. |
KubeletServerCertificateExpiration | Warning: Kubelet server certificate is about to expire. |
KubeletServerCertificateExpiration | Critical: Kubelet server certificate is about to expire. |
KubeletClientCertificateRenewalErrors | Warning: Kubelet has failed to renew its client certificate. |
KubeletServerCertificateRenewalErrors | Warning: Kubelet has failed to renew its server certificate. |
KubeletDown | Critical: Target disappeared from Prometheus target discovery. |
Metrics
The following metrics are automatically written to your Grafana Cloud instance by connecting your Kubernetes instance through this integration:
- cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
- cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
- cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
- cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
- container_cpu_cfs_periods_total
- container_cpu_cfs_throttled_periods_total
- container_cpu_usage_seconds_total
- container_fs_reads_bytes_total
- container_fs_reads_total
- container_fs_writes_bytes_total
- container_fs_writes_total
- container_memory_cache
- container_memory_rss
- container_memory_swap
- container_memory_working_set_bytes
- container_network_receive_bytes_total
- container_network_receive_packets_dropped_total
- container_network_receive_packets_total
- container_network_transmit_bytes_total
- container_network_transmit_packets_dropped_total
- container_network_transmit_packets_total
- go_goroutines
- kube_daemonset_status_current_number_scheduled
- kube_daemonset_status_desired_number_scheduled
- kube_daemonset_status_number_available
- kube_daemonset_status_number_misscheduled
- kube_daemonset_status_updated_number_scheduled
- kube_deployment_metadata_generation
- kube_deployment_spec_replicas
- kube_deployment_status_observed_generation
- kube_deployment_status_replicas_available
- kube_deployment_status_replicas_updated
- kube_horizontalpodautoscaler_spec_max_replicas
- kube_horizontalpodautoscaler_spec_min_replicas
- kube_horizontalpodautoscaler_status_current_replicas
- kube_horizontalpodautoscaler_status_desired_replicas
- kube_job_failed
- kube_job_status_active
- kube_job_status_start_time
- kube_namespace_status_phase
- kube_node_info
- kube_node_spec_taint
- kube_node_status_allocatable
- kube_node_status_capacity
- kube_node_status_condition
- kube_pod_container_resource_limits
- kube_pod_container_resource_requests
- kube_pod_container_status_waiting_reason
- kube_pod_info
- kube_pod_owner
- kube_pod_status_phase
- kube_replicaset_owner
- kube_resourcequota
- kube_statefulset_metadata_generation
- kube_statefulset_replicas
- kube_statefulset_status_current_revision
- kube_statefulset_status_observed_generation
- kube_statefulset_status_replicas
- kube_statefulset_status_replicas_ready
- kube_statefulset_status_replicas_updated
- kube_statefulset_status_update_revision
- kubelet_certificate_manager_client_expiration_renew_errors
- kubelet_certificate_manager_client_ttl_seconds
- kubelet_certificate_manager_server_ttl_seconds
- kubelet_cgroup_manager_duration_seconds_bucket
- kubelet_cgroup_manager_duration_seconds_count
- kubelet_node_config_error
- kubelet_node_name
- kubelet_pleg_relist_duration_seconds_bucket
- kubelet_pleg_relist_duration_seconds_count
- kubelet_pleg_relist_interval_seconds_bucket
- kubelet_pod_start_duration_seconds_bucket
- kubelet_pod_start_duration_seconds_count
- kubelet_pod_worker_duration_seconds_bucket
- kubelet_pod_worker_duration_seconds_count
- kubelet_running_container_count
- kubelet_running_containers
- kubelet_running_pod_count
- kubelet_running_pods
- kubelet_runtime_operations_errors_total
- kubelet_runtime_operations_total
- kubelet_server_expiration_renew_errors
- kubelet_volume_stats_available_bytes
- kubelet_volume_stats_capacity_bytes
- kubelet_volume_stats_inodes
- kubelet_volume_stats_inodes_used
- kubernetes_build_info
- machine_memory_bytes
- namespace_cpu:kube_pod_container_resource_limits:sum
- namespace_cpu:kube_pod_container_resource_requests:sum
- namespace_memory:kube_pod_container_resource_limits:sum
- namespace_memory:kube_pod_container_resource_requests:sum
- namespace_workload_pod
- namespace_workload_pod:kube_pod_owner:relabel
- node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
- node_namespace_pod_container:container_memory_cache
- node_namespace_pod_container:container_memory_rss
- node_namespace_pod_container:container_memory_swap
- node_namespace_pod_container:container_memory_working_set_bytes
- node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile
- process_cpu_seconds_total
- process_resident_memory_bytes
- rest_client_requests_total
- storage_operation_duration_seconds_count
- storage_operation_errors_total
- volume_manager_total_volumes
Changelog
# 0.0.8 - September 2022
* Use a more efficient query for homepage 'Configuration status' panel
* Remove opinionated regex for log datasource name, allowing easier use outside of Grafana Cloud
* Update upstream K8s mixin: [1ddc6f6f739cc9fe4b8ac6a0fbb23cb09fe53bc3](https://github.com/kubernetes-monitoring/kubernetes-mixin/commit/1ddc6f6f739cc9fe4b8ac6a0fbb23cb09fe53bc3)
* Fix storage queries and overview panels [kubernetes-monitoring/kubernetes-mixin#789](https://github.com/kubernetes-monitoring/kubernetes-mixin/pull/789)
# 0.0.7 - July 2022
* Update upstream K8s mixin: [b8f44bb7be728423836bef0e904ec7166895a34b](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/b8f44bb7be728423836bef0e904ec7166895a34b)
* Add cluster label to aggregations in more alert queries
# 0.0.6 - May 2022
* Update to upstream k8s mixin - Commit [62ad10fe9ceb53c6b846871997abbfe8e0bd7cf5](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/62ad10fe9ceb53c6b846871997abbfe8e0bd7cf5)
* Add cluster label to aggregations in alert queries
# 0.0.5 - February 2022
* Add Grafana Agent experimental event logging and annotations
* Update to upstream k8s mixin - Commit [177bc8ec789fa049a9585713d232035b159f8c92](https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/177bc8ec789fa049a9585713d232035b159f8c92)
# 0.0.4 - January 2022
* CPU utilization as CPU seconds, rather than percent on cluster, and multicluster dashboards
# 0.0.3 - December 2021
* Reorder template variables to logically represent broadest to narrowest from left to right
# 0.0.2 - October 2021
* Update to upstream Kubernetes mixin
* Fix 404 on links between dashboards
* Add a top-level dashboard/homepage
* Add allowlisting of only metrics used by dashboards, alerts, and rules to agent config
# 0.0.1 - April 2021
* Initial release
Cost
By connecting your Kubernetes instance to Grafana Cloud you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see Active series and dpm usage and Cloud tier pricing.