Kubernetes Monitoring

Visualize and alert on your Kubernetes cluster in minutes, not days.

Why use Kubernetes Monitoring in Grafana Cloud?

Accelerate time to value

Reduce deployment, setup, and troubleshooting time with this ready-to-use monitoring tool that only requires running a few CLI commands or adding some small changes to your Helm chart.

Identify root causes faster

Drill down through your infrastructure with the cluster navigation view to identify and resolve issues, without the hassle of switching between different windows and monitoring tools.

Reduce costs

Efficiency and cost monitoring visualizations deliver comprehensive insights into your spending, enabling data-driven decisions about resource allocation, scaling strategies, and tech investments.

Kubernetes Monitoring in Grafana Cloud: Getting started

Watch
2:35
Cluster navigation view

Full visibility, from Kubernetes clusters to Kubernetes pods

Quickly move throughout your Kubernetes setup in a single UI. Start with a cluster view and drill all the way down to specific Kubernetes pods with just a few clicks.

  • High-level monitoring provides infrastructure visibility
  • Color-coded health visuals and icons lead to faster issue identification and resolution
cluster navigation
node observability
Node observability

Understand your nodes at once

Gain a bird’s eye view over the health, utilization, and configuration of your nodes.

  • Discover all nodes in a cluster – their condition and pod density
  • Resource usage forecasts
  • Color-coded indicators guide node resource management
  • See inside a node indicating associated pods’ health
Resource utilization efficiency

Better insight into your resource usage

Dial in on cloud resource usage to optimize efficiency

  • Discover unused and stranded resources
  • Easily determine better pod limits and improved placement
  • Understand your resource management policy imperfections
  • Decrease your carbon footprint
resource utilization efficiency dashboard
metrics and alerts
Metrics and alerts

Opinionated metrics and alerts

Access the Kube-state-metrics and alerting rules needed to effectively monitor Kubernetes clusters.

  • A curated set of metrics to avoid cardinality explosion
  • Community-built alerting standards
Pod logs

Instant Prometheus-correlated logs

Prometheus and Grafana Loki’s shared metadata keeps the exact same labels for your Kubernetes cluster, so accessing correlated Kubernetes metrics and logs couldn’t be easier.

k8s agent logs
monitoring dashboards
Prebuilt dashboards

Preconfigured dashboards

Kubernetes Monitoring in Grafana Cloud provides out-of-the-box dashboards covering Kubernetes clusters and their workloads. These dashboards monitor:

  • Resource usage
  • Cluster operations
Monitor costs

Cost monitoring

Gain better insight into your Kubernetes costs, spending trends, and potential savings with the cost monitoring feature, which is based on the open source project Opencost.

  • Break down costs and resource allocation by cloud providers
  • Organize Kubernetes costs by resource types
  • Visualize savings vs. cost trends
  • Get savings suggestions based on your resource usage
k8s agent logs

It’s easy to get started

For full implementation details and best practices

1

Sign up

Create your free Grafana Cloud account.

2

Connect your data

With a few clicks, set up default configurations for prebuilt dashboards and alerting rules.

3

Deploy

Data will stream from your cluster into Grafana Cloud.

Before we started using Grafana Loki, searching for logs was a challenge. The one-stop-shop experience with Grafana Cloud gives us the ability to cross-reference data with application workload and infrastructure metrics, which saves us time and makes our search for relevant logs much easier.
James Wojewoda
Lead Site Reliability Engineer | Beeswax
The caliber of dashboards and alerts from Grafana Cloud’s Kubernetes Monitoring solution would take weeks to fully implement on their own. Operationally, our largest value-add is that teams can speak the same language and view things through the same lens. It has been pivotal to our observability journey as we work towards making more data-driven decisions. And with the in-product cost monitoring feature, we’re excited to see how we can actually assign those dollar amounts going forward.
Nick Adolf
Site Reliability Engineer | OpenGov

Kubernetes metrics and alerting rules

The Kubernetes Monitoring solution in Grafana Cloud ingests a set of default metrics at a 60-second scrape interval. The set of alerting rules helps with setting up and running alerts for clusters and their workloads.

Read more about Kubernetes metrics and alerting rules

Key metrics included

*scrollable
KubeNodeNotReady
KubeNodeUnreachable
KubeletTooManyPods
KubeNodeReadinessFlapping
KubeletPlegDurationHigh
KubeletPodStartUpLatencyHigh
KubeletClientCertificateExpiration
KubeletServerCertificateExpiration
KubeletClientCertificateRenewalErrors
KubeletServerCertificateRenewalErrors
KubeletDown
KubeVersionMismatch
KubeClientErrors
KubeCPUOvercommit
KubeMemoryOvercommit
KubeCPUQuotaOvercommit
KubeMemoryQuotaOvercommit
KubeQuotaAlmostFull
KubeQuotaFullyUsed
KubeQuotaExceeded
CPUThrottlingHigh
KubePodCrashLooping
KubePodNotRead
KubeDeploymentGenerationMismatch
KubeDeploymentReplicasMismatch
KubeStatefulSetReplicasMismatch
KubeStatefulSetGenerationMismatch
KubeStatefulSetUpdateNotRolledOut
KubeDaemonSetRolloutStuck
KubeContainerWaiting
KubeDaemonSetNotScheduled
KubeDaemonSetMisScheduled
KubeJobCompletion
KubeJobFailed
KubeHpaReplicasMismatch
KubeHpaMaxedOut

Key alerting rules included

*scrollable
cluster:namespace:pod_cpu:active:kube_pod_container_resource_limits
cluster:namespace:pod_cpu:active:kube_pod_container_resource_requests
cluster:namespace:pod_memory:active:kube_pod_container_resource_limits
cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
container_cpu_cfs_periods_total
container_cpu_cfs_throttled_periods_total
container_cpu_usage_seconds_total
container_fs_reads_bytes_total
container_fs_reads_total
container_fs_writes_bytes_total
container_fs_writes_total
container_memory_cache
container_memory_rss
container_memory_swap
container_memory_working_set_bytes
container_network_receive_bytes_total
container_network_receive_packets_dropped_total
container_network_receive_packets_total
container_network_transmit_bytes_total
container_network_transmit_packets_dropped_total
container_network_transmit_packets_total
go_goroutines
kube_daemonset_status_current_number_scheduled
kube_daemonset_status_desired_number_scheduled
kube_daemonset_status_number_available
kube_daemonset_status_number_misscheduled
kube_daemonset_updated_number_scheduled
kube_deployment_metadata_generation
kube_deployment_spec_replicas
kube_deployment_status_observed_generation
kube_deployment_status_replicas_available
kube_deployment_status_replicas_updated
kube_horizontalpodautoscaler_spec_max_replicas
kube_horizontalpodautoscaler_spec_min_replicas
kube_horizontalpodautoscaler_status_current_replicas
kube_horizontalpodautoscaler_status_desired_replicas
kube_job_failed
kube_job_spec_completions
kube_job_status_succeeded
kube_namespace_created
kube_node_info
kube_node_spec_taint
kube_node_status_allocatable
kube_node_status_capacity
kube_node_status_condition
kube_pod_container_resource_limits
kube_pod_container_resource_requests
kube_pod_container_status_waiting_reason
kube_pod_info
kube_pod_owner
kube_pod_status_phase
kube_replicaset_owner
kube_resourcequota
kube_statefulset_metadata_generation
kube_statefulset_replicas
kube_statefulset_status_current_revision
kube_statefulset_status_observed_generation
kube_statefulset_status_replicas
kube_statefulset_status_replicas_ready
kube_statefulset_status_replicas_updated
kube_statefulset_status_update_revision
kubelet_certificate_manager_client_expiration_renew_errors
kubelet_certificate_manager_client_ttl_seconds
kubelet_certificate_manager_server_ttl_seconds
kubelet_cgroup_manager_duration_seconds_bucket
kubelet_cgroup_manager_duration_seconds_count
kubelet_node_config_error
kubelet_node_name
kubelet_pleg_relist_duration_seconds_bucket
kubelet_pleg_relist_duration_seconds_count
kubelet_pleg_relist_interval_seconds_bucket
kubelet_pod_start_duration_seconds_count
kubelet_pod_worker_duration_seconds_bucket
kubelet_pod_worker_duration_seconds_count
kubelet_running_container_count
kubelet_running_containers
kubelet_running_pod_count
kubelet_running_pods
kubelet_runtime_operations_duration_seconds_bucket
kubelet_runtime_operations_errors_total
kubelet_runtime_operations_total
kubelet_server_expiration_renew_errors
kubelet_volume_stats_available_bytes
kubelet_volume_stats_capacity_bytes
kubelet_volume_stats_inodes
kubelet_volume_stats_inodes_used
kubernetes_build_info
machine_memory_bytes
namespace_cpu:kube_pod_container_resource_limits:sum
namespace_cpu:kube_pod_container_resource_requests:sum
namespace_memory:kube_pod_container_resource_limits:sum
namespace_memory:kube_pod_container_resource_requests:sum
namespace_workload_pod
namespace_workload_pod:kube_pod_owner:relabel
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
node_namespace_pod_container:container_memory_cache
node_namespace_pod_container:container_memory_rss
node_namespace_pod_container:container_memory_swap
node_namespace_pod_container:container_memory_working_set_bytes
node_quantile:kubelet_pleg_relist_duration_seconds:histogram_quantile
process_cpu_seconds_total
process_resident_memory_bytes
rest_client_request_duration_seconds_bucket
rest_client_requests_total
storage_operation_duration_seconds_bucket
storage_operation_duration_seconds_count
storage_operation_errors_total
up
volume_manager_total_volumes

Helpful resources

Ready to get started with Kubernetes Monitoring?

To use Kubernetes Monitoring, you have three options in Grafana Cloud. All plans come with prebuilt dashboards plus metrics and alerting rules.

Cloud Free

Perfect for early stage and small teams. Free forever.

10k metrics, 50GB logs, 50GB traces, 50GB profiles, 500VUh of k6 testing, and up to 3 active users

Features include:

  • 14-day retention
  • Grafana OnCall
  • Grafana Incident
  • Grafana k6 testing
  • Synthetic Monitoring
  • Grafana Alerting
  • Infra integration catalog

Cloud Pro

Perfect for growing teams at only $29/mo + usage.

Start with 20k metrics, 100GB logs, 100GB traces, 1000 VUh of k6 testing, and up to 5 active users

Includes all features in Free, plus:

  • Retention: 13 months for metrics; 30 days for logs, traces, and k6 test results
  • Grafana Machine Learning
  • SSO/SAML/LDAP
  • Data source permissions
  • Cloud SLA and support
  • Query caching
  • Reporting and export
  • Optional add-on Enterprise plugin

Cloud Advanced

Perfect for global teams. Custom pricing.

Includes all features in Pro, plus:

  • Customized retention
  • Access to all Enterprise plugins
  • Audit logging
  • Enhanced LDAP
  • Team sync
  • Custom branding
  • Dedicated technical account management
  • Role-based access control