Kubernetes cluster monitoring (via Prometheus)

Monitor a Kubernetes cluster using Prometheus TSDB. Shows overall cluster CPU / Memory / Disk usage as well as individual pod statistics.

Kubernetes cluster monitoring (via Prometheus) screenshot 1
Kubernetes cluster monitoring (via Prometheus) screenshot 2

Requirements

UPDATE

I’ve added an example config for Prometheus 1.0.0 below as well

Prometheus configured to collect data from the following inside of a Kubernetes cluster:

  • node-exporter
  • cadvisor

Node-exporter should be a run as a daemonset on every minion, Cadvisor is built into the kubelet and just needs to be scraped via Prometheus config.

Example Prometheus Config (pre 1.0.0)

scrape_configs:
- job_name: 'kubernetes-cluster'

  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role]
    action: keep
    regex: (?:apiserver|node)
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:10255'
    target_label: __address__

- job_name: 'kubernetes-node-exporter'

  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true

  relabel_configs:
  - source_labels: [__meta_kubernetes_role]
    action: keep
    regex: (node)
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:9100'
    target_label: __address__

Example Prometheus Config (1.0.0)

global:
  scrape_interval: 10s
  scrape_timeout: 10s
  evaluation_interval: 10s

scrape_configs:
- job_name: 'kubernetes-nodes-cadvisor'
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:10255'
    target_label: __address__

- job_name: 'kubernetes-apiserver-cadvisor'
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: apiserver
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:10255'
    target_label: __address__

- job_name: 'kubernetes-node-exporter'
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  kubernetes_sd_configs:
  - api_servers:
    - 'https://kubernetes.default.svc'
    in_cluster: true
    role: node
  relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)
  - source_labels: [__meta_kubernetes_role]
    action: replace
    target_label: kubernetes_role
  - source_labels: [__address__]
    regex: '(.*):10250'
    replacement: '${1}:9100'
    target_label: __address__

Suggested Grafana manifest for Kubernetes (if you aren’t using the built-in monitoring addon)

apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
  ports:
  - port: 3000
    targetPort: 3000
  selector:
    app: grafana
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: grafana
  name: grafana
spec:
  replicas: 1
  revisionHistoryLimit: 2
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - image: grafana/grafana:3.1.0
        name: grafana
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
          - name: GF_AUTH_BASIC_ENABLED
            value: "false"
          - name: GF_AUTH_ANONYMOUS_ENABLED
            value: "true"
          - name: GF_AUTH_ANONYMOUS_ORG_ROLE
            value: Admin
          - name: GF_SERVER_ROOT_URL
            value: /api/v1/proxy/namespaces/default/services/grafana/

Additional support

Feel free to tweet @iamnayr or join the Kubernetes slack channel and ping @ryan_sf

Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies