1 Kubernetes cluster overview(कुबरनेटेस)

Dashboard

This dashboard shows SLO, error budget and can help troubleshooting issue in k8s cluster at cluster, node and namespace level.
Last updated: 6 days ago

Downloads: 3596

Reviews: 0

  • Screenshot 2020-05-20 at 9.13.20 AM.png
    Screenshot 2020-05-20 at 9.13.20 AM.png
  • Screenshot 2020-05-20 at 9.15.20 AM-min.png
    Screenshot 2020-05-20 at 9.15.20 AM-min.png
  • Screenshot 2020-05-20 at 9.20.02 AM-min.png
    Screenshot 2020-05-20 at 9.20.02 AM-min.png
  • Screenshot 2020-05-20 at 9.17.39 AM.png
    Screenshot 2020-05-20 at 9.17.39 AM.png

Prometheus helm chart used: stable/prometheus-operator@8.13.8. Grafana version recommended: 7.0.0

For latest dashboard please visit the git repo: dguyhasnoname

Special Plugin dependencies:

  1. Status dot
  2. single stat

values.yaml for operator helm chart:

prometheusOperator:
  createCustomResource: true

alertmanager:
  ingress:
    enabled: true
    hosts: [alertmanager.abc.com]

grafana:
  ingress:
    enabled: true
    hosts: [grafana.abc.com]
  plugins:
  - btplc-status-dot-panel

prometheus:
  ingress:
    enabled: true
    hosts: [prometheus.abc.com]
  prometheusSpec:
    replicas: 1
    podAntiAffinity: hard
    podAntiAffinityTopologyKey: failure-domain.beta.kubernetes.io/zone
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: default-storage-class
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 70Gi
    resources:
      requests:
        cpu: 200m
        memory: 1024Mi
      limits:
        cpu: 1000m
        memory: 1024Mi

# Exporters
kubeApiServer:
  enabled: true

kubelet:
  enabled: true

kubeControllerManager:
  enabled: true

coreDns:
  enabled: true

kubeDns:
  enabled: true

kubeEtcd:
  enabled: true

kubeScheduler:
  enabled: true

kubeProxy:
  enabled: true

kubeStateMetrics:
  enabled: true

nodeExporter:
  enabled: true

This dashboard show SLO and error budget for over all cluster/namespace and can help troubleshooting issue in k8s cluster at cluster, node and namespace level.

Cluster SLO and error budget has been calculated based on control plane pods. Namespace SLO and error budget is based on all pods running in the namespace.

At cluster level you can find below details:

  1. Node readiness state
  2. No. of pods in cluster
  3. memory/CPU usage in cluster: total, node-wise and namespace wise.
  4. PVCs in cluster and read only PVCs
  5. Cluster age
  6. Waiting/Teminated pods count
  7. cluster node details

At node level you can find below details:

  1. Uptime
  2. Node readiness
  3. CPU, memory and load on node.
  4. Kubelet errors which can be related to PLEG
  5. pod count on node by namespace
  6. Memory/Disk/PID pressure
  7. Top 5 memory guzzling pods
  8. NTP time deviation
  9. Kubelet eviction stats
  10. Node evictions

At namespace level you can find below details:

  1. pod readiness
  2. ready/waiting/terminated pod count
  3. No. of deployments in namespace
  4. Pod-node relation over period of time
  5. node wise pod count in the namespace
  6. pod restarts in namespace
  7. pod state over a period of time
  8. memory/cpu utilisation by pod
  9. resource quota in namespace