Navigate Kubernetes Monitoring
After you configure and begin monitoring telemetry data in Kubernetes Monitoring, you can navigate through your nodes and pods to evaluate their health. This topic shows how to do so. If you have not yet set up Kubernetes Monitoring, see Configure Kubernetes Monitoring.
Cluster navigation helps you explore your infrastructure by navigating through the object model.
On the Namespaces page, you can can:
- Select a data source to view its namespaces and clusters.
- Filter by namespace and cluster.
- View the status phase of a namespace and cluster.
- View the alerts for a namespace and cluster in Grafana Alerting by clicking its number of alerts in the Alerts firing column.
- Click a namespace to drill down to view its workloads and pods.
- Click an Explore link, for example, Explore namespace or Explore alerts, to work directly with the Prometheus queries used to generate the metrics for dashboards and alerts.
Click a namespace from the the Namespaces page to view its workloads and the health of the pods inside each workload.
Click a workload in the namespace view to see a list of associated pods and general information, such as the date the workload was created, replica information, and the observed generation. Click a pod to view detailed pod health information.
When you click on a pod from the workload list, you can view its status. If a pod is healthy, the health bar is green. If not, the health bar is red. The Pod view also displays a quick graph of CPU usage from the past hour. You can also view the latest 100 log lines, and latest events if your stack is configured for logs and events.
The Nodes view displays all of the nodes in your clusters, as well as their condition, and current resource usage. This information can help you to improve your configuration by identifying when you might increase or decrease the number and size of nodes to improve performance.
To view the status of pods and workloads on each of your nodes, click Nodes under Kubernetes in the left side menu**, and select a node. You can filter by cluster and by node condition. Node monitoring gives you an immediate view of node capacity and general information.
The Node capacity section displays how much of a node’s resources you’re using. CPU, memory, and disk space all have usage bars that flag, with color, the following usage thresholds:
- Green: usage is 40-75% of maximum. This is the ideal state of resource usage.
- Yellow: usage is below 40%, or between 75% and 90%. Low usage percentages indicate that the node might be over-provisioned; higher percentages are approaching maximum capacity.
- Red: usage is 90-100%. Your node resource is dangerously close to maximum capacity.
The Node information section displays general node data, including the node’s
labels, and the current version of Kubernetes.
The Pods section lists information for all of the pods in the node.
You can quickly determine the health of the pod according to the color of the vertical boundary to the left of the pod name:
- Green: The pod is running.
- Red: The pod is not running.
Click the pod name to drill down to pod-level information, including the latest logs and events.
NOTE: You can only view events if you’ve installed the default dashboards and alerts (this deploys Grafana Agent), or used Grafana Agent or Grafana Agent Operator to ship your
Resource utilization efficiency
Resource utilization efficiency helps you minimize the deviation between resource allocation and actual resource utilization. This can help reduce the costs of cloud computing. The visualization displays the overall state of your fleet’s CPU usage at a glance.
Cluster resource utilization
Cluster resource utilization gives you a correlative view between CPU and memory, so you can discover any stranded resources in your fleet.
The thresholds of resource utilization efficiency are shown in the following table:
|Well utilized||Green||Between 60 - 90% average utilization|
|Overprovisioned||Red||Above 90% average utilization|
|Underprovisioned||Yellow||Below 60% average utilization|
You can correlate between average and maximum resource usage for performance and stability troubleshooting and observe resource usage per cluster and per Cloud provider.
Note: If no data appears in the Efficiency view, Node Exporter metrics might be missing. See how to resolve the issue of missing Kubernetes Efficiency data.
Workload and pod view
The workload and pod view lets you drill down to determine root causes.
Click on a Namespace from Cluster navigation and select the desired workload. You’ll see average CPU and memory usage for the last hour. This can help you identify root causes of resource utilization issues.
Kubernetes Monitoring includes nine dashboards out of the box to help you get started with observing and monitoring your Kubernetes clusters and their workloads. This set includes the following:
(Home) Kubernetes Overview, the principal dashboard that displays high-level cluster resource usage and configuration status.
Kubernetes / Compute Resources (7 dashboards), a set of dashboards to drill down into resource usage by the following levels:
- Namespace (by Pods)
- Namespace (by workloads, like Deployments or DaemonSets)
- Pods and containers
- Workloads (Deployments, DaemonSets, StatefulSets, etc.)
These dashboards contain links to sub-objects, so you can jump from cluster, to Namespace, to Pod, etc.
Kubernetes / Kubelet, a dashboard that helps you understand Kubelet performance on your Nodes, and provides useful summary metrics like number of running Pods, Containers, and Volumes on a given Node .
Kubernetes / Persistent Volumes, a dashboard that helps you understand usage of your configured PersistentVolumes.
Kubernetes Monitoring includes preconfigured alerting rules that trigger alerts. You can view and investigate the alerts in the Alerts view by clicking an alert.
Drill down to investigate fired alerts.
A useful strategy when investigating alerts is to silence some default alerts temporarily. See how to manage silences. To learn more about alerts in general, see View alert rules.
Users with the Admin role can use the Configuration tab to install or uninstall preconfigured dashboards and alerts. You can also view instructions to configure and deploy Grafana Agent to keep it up-to-date. For more information on updating your dashboards and rules, see Update Grafana Kubernetes components.
Note: In order to see the Grafana Agent configuration instructions, you must have the preconfigured dashboards and alerts installed.