← All dashboards

Kubernetes cluster monitoring (via Prometheus)

Monitor a Kubernetes cluster using Prometheus TSDB. Shows overall cluster CPU / Memory / Disk usage as well as individual pod statistics.

Requirements

UPDATE

I’ve added an example config for Prometheus 1.0.0 below as well

Prometheus configured to collect data from the following inside of a Kubernetes cluster:

  • node-exporter
  • cadvisor

Node-exporter should be a run as a daemonset on every minion, Cadvisor is built into the kubelet and just needs to be scraped via Prometheus config.

Example Prometheus Config (pre 1.0.0)

scrape_configs:
- job_name: 'kubernetes-cluster'

  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role]
    action: keep
    regex: (?:apiserver|node)
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:10255'
    target_label: __address__

- job_name: 'kubernetes-node-exporter'

  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role]
    action: keep
    regex: (node)
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:9100'
    target_label: __address__

Example Prometheus Config (1.0.0)

global:
  scrape_interval: 10s
  scrape_timeout: 10s
  evaluation_interval: 10s

scrape_configs:
- job_name: 'kubernetes-nodes-cadvisor'
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:10255'
    target_label: __address__

- job_name: 'kubernetes-apiserver-cadvisor'
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: apiserver
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:10255'
    target_label: __address__

- job_name: 'kubernetes-node-exporter'
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:9100'
    target_label: __address__

Suggested Grafana manifest for Kubernetes (if you aren’t using the built-in monitoring addon)

apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
  ports:
  - port: 3000
    targetPort: 3000
  selector:
    app: grafana
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: grafana
  name: grafana
spec:
  replicas: 1
  revisionHistoryLimit: 2
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - image: grafana/grafana:3.1.0
        name: grafana
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
          - name: GF_AUTH_BASIC_ENABLED
            value: "false"
          - name: GF_AUTH_ANONYMOUS_ENABLED
            value: "true"
          - name: GF_AUTH_ANONYMOUS_ORG_ROLE
            value: Admin
          - name: GF_SERVER_ROOT_URL
            value: /api/v1/proxy/namespaces/default/services/grafana/

Additional support

Feel free to tweet @iamnayr or join the Kubernetes slack channel and ping @ryan_sf

Dashboard revisions

RevisionDecscriptionCreated

Reviews

Login or Sign up to write a review

Reviews from the community

Get this dashboard

Data source:

Dependencies:

Import the dashboard template:

or

Download JSON

Docs: Importing dashboards

Downloads: 9,797