1 Kubernetes cluster overview(कुबरनेटेस)

This dashboard can help troubleshooting issue in k8s cluster at cluster, node and namespace level.

1 Kubernetes cluster overview(कुबरनेटेस) screenshot 1
1 Kubernetes cluster overview(कुबरनेटेस) screenshot 2
1 Kubernetes cluster overview(कुबरनेटेस) screenshot 3
1 Kubernetes cluster overview(कुबरनेटेस) screenshot 4

Prometheus helm chart used: stable/prometheus-operator@8.13.8. Grafana version recommended: 7.0.0 or higher Latest k8s version tested upon: v1.20.2

For latest dashboard please visit the git repo: dguyhasnoname

Special Plugin dependencies:

  1. Status dot
  2. single stat

values.yaml for operator helm chart:

prometheusOperator:
  createCustomResource: true

alertmanager:
  ingress:
    enabled: true
    hosts: [alertmanager.abc.com]

grafana:
  image:
    repository: grafana/grafana
    tag: 7.0.3
  ingress:
    enabled: true
    hosts: [grafana.abc.com]
  plugins:
  - btplc-status-dot-panel

prometheus:
  ingress:
    enabled: true
    hosts: [prometheus.abc.com]
  prometheusSpec:
    replicas: 1
    podAntiAffinity: hard
    podAntiAffinityTopologyKey: failure-domain.beta.kubernetes.io/zone
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: default-storage-class
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 70Gi
    resources:
      requests:
        cpu: 200m
        memory: 1024Mi
      limits:
        cpu: 1000m
        memory: 1024Mi

# Exporters
kubeApiServer:
  enabled: true

kubelet:
  enabled: true

kubeControllerManager:
  enabled: true

coreDns:
  enabled: true

kubeDns:
  enabled: true

kubeEtcd:
  enabled: true

kubeScheduler:
  enabled: true

kubeProxy:
  enabled: true

kubeStateMetrics:
  enabled: true

nodeExporter:
  enabled: true

This dashboard show SLO and error budget for over all cluster/namespace and can help troubleshooting issue in k8s cluster at cluster, node and namespace level.

Cluster SLO and error budget has been calculated based on control plane pods. Namespace SLO and error budget is based on all pods running in the namespace.

At cluster level you can find below details:

  1. Node readiness state
  2. No. of pods in cluster
  3. memory/CPU usage in cluster: total, node-wise and namespace wise.
  4. PVCs in cluster and read only PVCs
  5. Cluster age
  6. Waiting/Teminated pods count
  7. cluster node details

At node level you can find below details:

  1. Uptime
  2. Node readiness
  3. CPU, memory and load on node.
  4. Kubelet errors which can be related to PLEG
  5. pod count on node by namespace
  6. Memory/Disk/PID pressure
  7. Top 5 memory guzzling pods
  8. NTP time deviation
  9. Kubelet eviction stats
  10. Node evictions

At namespace level you can find below details:

  1. pod readiness
  2. ready/waiting/terminated pod count
  3. No. of deployments in namespace
  4. Pod-node relation over period of time
  5. node wise pod count in the namespace
  6. pod restarts in namespace
  7. pod state over a period of time
  8. memory/cpu utilisation by pod
  9. resource quota in namespace
Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies