Docker Swarm & Container Overview

Dashboard

Overview of the most important Docker swarm and container metrics. (cAdvisor/NodeExporter/Prometheus)
Last updated: 2 months ago

Downloads: 1433

  • Screen Shot 2016-10-25 at 17.56.05.png
    Screen Shot 2016-10-25 at 17.56.05.png
  • Screen Shot 2016-10-25 at 18.03.40.png
    Screen Shot 2016-10-25 at 18.03.40.png
  • Captura de pantalla 2016-11-05 a las 16.08.11.png
    Captura de pantalla 2016-11-05 a las 16.08.11.png

It modifies a little bit the original dashboard to adapt the graphs to fit better with a Docker swarm cluster which is running cAdvisor and Node Exporter on each node.

To use it you need:

  • A Docker swarm mode cluster.
  • Launch some services. Swarm will automatically propagate some labels that are used by the dashboard.
  • If you don't launch the services using the "docker stack deploy XXX" command there's another label that you'll need to launch per service (--container-label com.docker.stack.namespace=XXX) to identify those services by its intended usage.
  • You'll need to launch cAdvisor and node-exporter as global service in the cluster, and use the same network for them and the Prometheus instance.
  • Node exporter can export extra metrics, in this case I found useful to export the hostname, to allow the node metrics to be split by it. An example of this can be found in the repo: https://github.com/bvis/docker-node-exporter
  • Prometheus needs to be launched with the auto-discovery configuration based on DNS.
  • It uses Elasticsearch to search errors in the logs generated by logstash
  • It uses Elasticsearch, in another index, to check for alerts fired and resolved
  • It assumes that in your cluster you are using a proxy for public traffic. You can decide from your services list which one is your proxy. I find useful to have split the traffic of this service, because it basically can distort the traffic metrics. This configuration should work:
global:
  scrape_interval:     30s
  evaluation_interval: 30s

  labels:
      cluster: swarm
      replica: "1"

scrape_configs:
  - job_name: 'cadvisor'
    dns_sd_configs:
    - names:
      - 'tasks.cadvisor'
      type: 'A'
      port: 8080

  - job_name: 'node-exporter'
    dns_sd_configs:
    - names:
      - 'tasks.node-exporter'
      type: 'A'
      port: 9100

Based on Docker Host & Container Overviewby by uschtwill (https://grafana.net/dashboards/395).

If you are interested on deploy the full monitoring stack you can use the next commands:

docker \
  network create --driver overlay monitoring

docker \
  service create --name cadvisor \
  --mode global \
  --network monitoring \
  --label com.docker.stack.namespace=monitoring \  
  --container-label com.docker.stack.namespace=monitoring \
  --mount type=bind,src=/,dst=/rootfs:ro \
  --mount type=bind,src=/var/run,dst=/var/run:rw \
  --mount type=bind,src=/sys,dst=/sys:ro \
  --mount type=bind,src=/var/lib/docker/,dst=/var/lib/docker:ro \
  google/cadvisor:v0.24.1

docker \
  service create --name node-exporter \
  --mode global \
  --network monitoring \
  --label com.docker.stack.namespace=monitoring \
  --container-label com.docker.stack.namespace=monitoring \
  --mount type=bind,source=/proc,target=/host/proc \
  --mount type=bind,source=/sys,target=/host/sys \
  --mount type=bind,source=/,target=/rootfs \
  --mount type=bind,source=/etc/hostname,target=/etc/host_hostname \
  -e HOST_HOSTNAME=/etc/host_hostname \
  basi/node-exporter:v0.1.1 \
  -collector.procfs /host/proc \
  -collector.sysfs /host/sys \
  -collector.filesystem.ignored-mount-points "^/(sys|proc|dev|host|etc)($|/)" \
  --collector.textfile.directory /etc/node-exporter/ \
  --collectors.enabled="conntrack,diskstats,entropy,filefd,filesystem,loadavg,mdadm,meminfo,netdev,netstat,stat,textfile,time,vmstat,ipvs"

docker \
  service create --name alertmanager \
  --network monitoring \
  --label com.docker.stack.namespace=monitoring \
  --container-label com.docker.stack.namespace=monitoring \
  --publish 9093:9093 \
  -e "SLACK_API=https://hooks.slack.com/services/TOKEN-HERE" \
  -e "LOGSTASH_URL=http://logstash:8080/" \
  basi/alertmanager:v0.1.0 \
    -config.file=/etc/alertmanager/config.yml

docker \
  service create \
  --name prometheus \
  --network monitoring \
  --label com.docker.stack.namespace=monitoring \
  --container-label com.docker.stack.namespace=monitoring \
  --publish 9090:9090 \
  basi/prometheus-swarm:v0.3.1 \
    -config.file=/etc/prometheus/prometheus.yml \
    -storage.local.path=/prometheus \
    -web.console.libraries=/etc/prometheus/console_libraries \
    -web.console.templates=/etc/prometheus/consoles \
    -alertmanager.url=http://alertmanager:9093

docker \
  service create \
  --name grafana \
  --network monitoring \
  --label com.docker.stack.namespace=monitoring \
  --container-label com.docker.stack.namespace=monitoring \
  --publish 3000:3000 \
  -e "GF_SERVER_ROOT_URL=http://grafana.${CLUSTER_DOMAIN}" \
  -e "GF_SECURITY_ADMIN_PASSWORD=$GF_PASSWORD" \
  -e "PROMETHEUS_ENDPOINT=http://prometheus:9090" \
  -e "ELASTICSEARCH_ENDPOINT=$ES_ADDRESS" \
  -e "ELASTICSEARCH_USER=$ES_USERNAME" \
  -e "ELASTICSEARCH_PASSWORD=$ES_PASSWORD" \
  basi/grafana:v4.1.1

In this example there are some components missing, like the logstash processor for the alerts and the elasticseach who's responsible to store the alerts sent by the logstash or the error logs.

More info about the usage of this dashboard can be found in this repo

If you liked it, you may buy me a coffee.