Mesos Observability Metrics


Based on
Last updated: 6 months ago

Downloads: 97

  • screencapture-grafana-tools-dcos-qa-hsi-rd2-thingworx-io-dashboard-db-mesos-observability-metrics-1499873921750.png

This dashboard is created based on the data in page and descriptions are just copy&pasted from there.

This document describes the observability metrics provided by Mesos master and agent nodes. This document also provides some initial guidance on which metrics you should monitor to detect abnormal situations in your cluster.

Mesos master and agent nodes report a set of statistics and metrics that enable cluster operators to monitor resource usage and detect abnormal situations early. The information reported by Mesos includes details about available resources, used resources, registered frameworks, active agents, and task state. You can use this information to create automated alerts and to plot different metrics over time inside a monitoring dashboard.

Metric information is not persisted to disk at either master or agent nodes, which means that metrics will be reset when masters and agents are restarted. Similarly, if the current leading master fails and a new leading master is elected, metrics at the new master will be reset.

Collector Configuration Details

# # Telegraf plugin for gathering metrics from N Mesos masters
   ## Timeout, in ms.
   timeout = 100
   ## A list of Mesos masters.
   masters = ["$IP:5050"]
   ## Master metrics groups to be collected, by default, all enabled.
   master_collections = [