1 Kubernetes cluster overview(कुबरनेटेस)

This dashboard can help troubleshooting issue in k8s cluster at cluster, node and namespace level.

Prometheus helm chart used: stable/prometheus-operator@8.13.8. Grafana version recommended: 7.0.0 or higher Latest k8s version tested upon: v1.20.2

For latest dashboard please visit the git repo: dguyhasnoname

Special Plugin dependencies:

Status dot
single stat

values.yaml for operator helm chart:

prometheusOperator: createCustomResource: true alertmanager: ingress: enabled: true hosts: [alertmanager.abc.com] grafana: image: repository: grafana/grafana tag: 7.0.3 ingress: enabled: true hosts: [grafana.abc.com] plugins: btplc-status-dot-panel prometheus: ingress: enabled: true hosts: [prometheus.abc.com] prometheusSpec: replicas: 1 podAntiAffinity: hard podAntiAffinityTopologyKey: failure-domain.beta.kubernetes.io/zone storageSpec: volumeClaimTemplate: spec: storageClassName: default-storage-class accessModes: ["ReadWriteOnce"] resources: requests: storage: 70Gi resources: requests: cpu: 200m memory: 1024Mi limits: cpu: 1000m memory: 1024Mi Exporters kubeApiServer: enabled: true kubelet: enabled: true kubeControllerManager: enabled: true coreDns: enabled: true kubeDns: enabled: true kubeEtcd: enabled: true kubeScheduler: enabled: true kubeProxy: enabled: true kubeStateMetrics: enabled: true nodeExporter: enabled: true

This dashboard show SLO and error budget for over all cluster/namespace and can help troubleshooting issue in k8s cluster at cluster, node and namespace level.

Cluster SLO and error budget has been calculated based on control plane pods. Namespace SLO and error budget is based on all pods running in the namespace.

At cluster level you can find below details:

Node readiness state
No. of pods in cluster
memory/CPU usage in cluster: total, node-wise and namespace wise.
PVCs in cluster and read only PVCs
Cluster age
Waiting/Teminated pods count
cluster node details

At node level you can find below details:

Uptime
Node readiness
CPU, memory and load on node.
Kubelet errors which can be related to PLEG
pod count on node by namespace
Memory/Disk/PID pressure
Top 5 memory guzzling pods
NTP time deviation
Kubelet eviction stats
Node evictions

At namespace level you can find below details:

pod readiness
ready/waiting/terminated pod count
No. of deployments in namespace
Pod-node relation over period of time
node wise pod count in the namespace
pod restarts in namespace
pod state over a period of time
memory/cpu utilisation by pod
resource quota in namespace

Revisions

Revision	Description	Created
			Download

Kubernetes

Grafana Labs solution

Monitor your Kubernetes deployment with prebuilt visualizations that allow you to drill down from a high-level cluster overview to pod-specific details in minutes.

Get this dashboard

Import the dashboard template

Download JSON

Datasource

Dependencies

Resources

Docs: Importing dashboards Webinar: Getting started with Grafana dashboard design Webinar: Building advanced Grafana dashboards

1 Kubernetes cluster overview(कुबरनेटेस)

Exporters

Data source config

Collector config:

Get this dashboard

Still have questions?

Get every update