Kubernetes cluster monitoring (via Prometheus)

Monitor a Kubernetes cluster using Prometheus TSDB. Shows overall cluster CPU / Memory / Disk usage as well as individual pod statistics.

Kubernetes cluster monitoring (via Prometheus) screenshot 1
Kubernetes cluster monitoring (via Prometheus) screenshot 2

Requirements

UPDATE

I've added an example config for Prometheus 1.0.0 below as well

Prometheus configured to collect data from the following inside of a Kubernetes cluster:

  • node-exporter
  • cadvisor

Node-exporter should be a run as a daemonset on every minion, Cadvisor is built into the kubelet and just needs to be scraped via Prometheus config.

Example Prometheus Config (pre 1.0.0)

scrape_configs:
- job_name: 'kubernetes-cluster'

tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

kubernetes_sd_configs:

  • api_servers:
    • 'https://kubernetes.default.svc' in_cluster: true

relabel_configs:

  • source_labels: [__meta_kubernetes_role] action: keep regex: (?:apiserver|node)

  • action: labelmap regex: _meta_kubernetes_node_label(.+)

  • source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role

  • source_labels: [address] regex: '(.*):10250' replacement: '${1}:10255' target_label: address

  • job_name: 'kubernetes-node-exporter'

    tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    kubernetes_sd_configs:

    • api_servers:
      • 'https://kubernetes.default.svc' in_cluster: true

    relabel_configs:

    • source_labels: [__meta_kubernetes_role] action: keep regex: (node)
    • action: labelmap regex: _meta_kubernetes_node_label(.+)
    • source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role
    • source_labels: [address] regex: '(.*):10250' replacement: '${1}:9100' target_label: address

Example Prometheus Config (1.0.0)

global:
  scrape_interval: 10s
  scrape_timeout: 10s
  evaluation_interval: 10s

scrape_configs:

  • job_name: 'kubernetes-nodes-cadvisor' tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs:

    • api_servers:
      • 'https://kubernetes.default.svc' in_cluster: true role: node relabel_configs:
    • action: labelmap regex: _meta_kubernetes_node_label(.+)
    • source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role
    • source_labels: [address] regex: '(.*):10250' replacement: '${1}:10255' target_label: address
  • job_name: 'kubernetes-apiserver-cadvisor' tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs:

    • api_servers:
      • 'https://kubernetes.default.svc' in_cluster: true role: apiserver relabel_configs:
    • action: labelmap regex: _meta_kubernetes_node_label(.+)
    • source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role
    • source_labels: [address] regex: '(.*):10250' replacement: '${1}:10255' target_label: address
  • job_name: 'kubernetes-node-exporter' tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs:

    • api_servers:
      • 'https://kubernetes.default.svc' in_cluster: true role: node relabel_configs:
    • action: labelmap regex: _meta_kubernetes_node_label(.+)
    • source_labels: [__meta_kubernetes_role] action: replace target_label: kubernetes_role
    • source_labels: [address] regex: '(.*):10250' replacement: '${1}:9100' target_label: address

Suggested Grafana manifest for Kubernetes (if you aren't using the built-in monitoring addon)

apiVersion: v1
kind: Service
metadata:
  name: grafana
spec:
  ports:
  - port: 3000
    targetPort: 3000
  selector:
    app: grafana
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: grafana
  name: grafana
spec:
  replicas: 1
  revisionHistoryLimit: 2
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - image: grafana/grafana:3.1.0
        name: grafana
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
          - name: GF_AUTH_BASIC_ENABLED
            value: "false"
          - name: GF_AUTH_ANONYMOUS_ENABLED
            value: "true"
          - name: GF_AUTH_ANONYMOUS_ORG_ROLE
            value: Admin
          - name: GF_SERVER_ROOT_URL
            value: /api/v1/proxy/namespaces/default/services/grafana/

Additional support

Feel free to tweet @iamnayr or join the Kubernetes slack channel and ping @ryan_sf

Revisions
RevisionDescriptionCreated
Kubernetes

Kubernetes

by Grafana Labs
Grafana Labs solution

Monitor your Kubernetes deployment with prebuilt visualizations that allow you to drill down from a high-level cluster overview to pod-specific details in minutes.

Learn more

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies