Metrictank exposes many metrics to aid with operating the software in production. As the metrictank team (the primary on-call team for metrictank at Grafana Labs) grows and onboards new people, and more customers deploy the software on their premises, we need to solve a few problems regarding the metrics exposed by metrictank:
- Public documentation should describe which metrics exist and what they mean.
- The code should have comments describing what each metric means.
- The metric descriptions in the documentation and code should never be out of date.
The solution we came up with was to create a tool, aptly named metrics2docs, which can generate metrics documentation from Golang code annotations. It incentivizes documentation at the code level along with the metrics declarations (which makes it easy to keep everything up to date) and generates the documentation based on that.
It supports inline comments:
statUpdate = stats.NewCounter32("idx.memory.ops.update") // metric idx.memory.ops.update is the number of updates to the memory idx
As well as block comments:
// metric tank.metrics_reordered is the number of points received that are going back in time, but are still // within the reorder window. in such a case they will be inserted in the correct order. // E.g. if the reorder window is 60 (datapoints) then points may be inserted at random order as long as their // timestamp is not older than the 60th datapoint counting from the newest. metricsReordered = stats.NewCounter32("tank.metrics_reordered")
We then have a script that developers run which makes sure our docs are up to date.
It generates our configuration documentation based on current config files, and also runs, quite simply,
metrics2docs . > docs/metrics.md to make sure the metrics documentation is up to date. We have another script which ensures that all auto-generated docs are up to date. (We generate docs via metrics2docs, but also auto-generated configuration docs, docs about all the binaries and their options, etc.)
The end result lives under /docs/metrics.md.
Of course, the next logical question is: How do we expose this information in Grafana? How do we make sure users can see documentation for each metric as they need it, or find metrics by searching through their documentation? We have a few ideas brewing on this topic and expect to start working on this in the near future.
For more info on best practices for operating metrictank (in particular, which metrics you may want to alert on), the operations guide is also a useful resource.
Finally, I’d like to mention how neat it is that Go provides packages such as go/parser and go/ast, which helps significantly when trying to write a tool that parses Go code and does cool things with it.