Consul Server Monitoring Dashboard
Maintained by the Consul team at HashiCorp. Displays critical health metrics about Consul servers, which are key to understanding Consul servers' behavior and stability in production. Also offers pre-built sections and panels for understanding usage of Consul by feature such as: KVs, DNS, the Catalog, and ACLs.
Critical metrics are based on the "key metrics" section in Consul's telemetry docs: https://www.consul.io/docs/agent/telemetry.html See these docs for more information on individual stats. If you have any questions, please reach out on our community discuss board at: https://discuss.hashicorp.com/c/consul/29
Due to Consul's architecture, some metrics are emitted on both server and client agents. Typical deploys have many more clients than servers running, which can add noise when monitoring Consul server health. To filter it down, we recommend adding labels in prometheus' scrape_config based on the consul's agent's role on the host. E.g. role="server" for Consul servers and role="client" for Consul client agents. This will allow you to adapt the panel queries to filter on role="server", showing only the timeseries emitted from servers. https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config