Graphite Carbon Metrics (obfuscurity)

Dashboard

Obfuscurity's "extended remix" of the Graphite Carbon Metrics dashboard
Last updated: 7 months ago

Downloads: 478

  • graphite-carbon-metrics-obfuscurity_rev7.png
    graphite-carbon-metrics-obfuscurity_rev7.png

This is a more exhaustive take on the original Graphite Carbon Metrics dashboard. Aside from some minor metric fixes, it adds new panels for memory footprint and cache details (keys & datapoints in cache, avg number of datapoints per key, etc).

Note that while the first half of the dashboard relies on Carbon's predictable naming format (carbon.agents.*), the latter half uses collectd metrics that vary according to the Graphite server's hostname. The dashboard is currently opinionated to assume a hostname of "graphite" (i.e. collectd.graphite.*) although I'm hoping to find time to leverage Grafana's new templating support to make this more dynamic. If your naming schema doesn't match mine, you'll need to update the affected panels' query definitions.

  • Carbon Metrics
    • Updates/sec (write ops)
    • Metrics received/sec (ingress)
    • Committed points/sec (datapoints written to disk)
    • Creates/sec (new Whisper files)
    • CPU (avg across Cache instances)
    • Points per Update (avg across Cache instances)
  • Carbon Memory & CPU
  • Metric keys in Cache (unique metric names)
  • Datapoints in Cache
  • Relay Destination Queue Length (datapoints queued per destination)
  • Average datapoints per key in Cache
  • Datapoints received during full cache
  • Average Relay Batch Size (average number of datapoints per relay transmission)
  • Disk write ops and write time (requires collectd disk plugin)
  • Metric retrieval time (requires collectd tail plugin and this configuration)
  • Network traffic (requires collectd interface plugin)
  • Load (requires collectd load plugin)
  • CPU (by state, requires collectd cpu plugin)

Collector Configuration Details

<Plugin "tail">

# cache performance
<File "/opt/graphite/storage/log/webapp/cache.log">
    Instance "graphite_web"
    <Match>
        Regex "Request-Cache hit "
        DSType "CounterInc"
        Type "counter"
        Instance "request_cache_hit"
    </Match>
    <Match>
        Regex "Request-Cache miss "
        DSType "CounterInc"
        Type "counter"
        Instance "request_cache_miss"
    </Match>
    <Match>
        Regex "CarbonLink creating a new socket "
        DSType "CounterInc"
        Type "counter"
        Instance "socket_create_count"
    </Match>
    <Match>
        Regex "CarbonLink cache-query request "
        DSType "CounterInc"
        Type "counter"
        Instance "query_count"
    </Match>
    <Match>
        Regex "CarbonLink set-metadata request "
        DSType "CounterInc"
        Type "counter"
        Instance "set-metadata_count"
    </Match>
    <Match>
        Regex "Data-Cache hit "
        DSType "CounterInc"
        Type "counter"
        Instance "data_cache_hit"
    </Match>
    <Match>
        Regex "Data-Cache miss "
        DSType "CounterInc"
        Type "counter"
        Instance "data_cache_miss"
    </Match>
</File>

# rendering performance
<File "/opt/graphite/storage/log/webapp/rendering.log">
    Instance "graphite_web"

    # PNG's
    <Match>
        Regex "Rendered PNG in ([0-9\.]+) seconds"
        DSType "CounterInc"
        Type "requests"
        Instance "render_png_count"
    </Match>
    <Match>
        Regex "Rendered PNG in ([0-9\.]+) seconds"
        DSType "GaugeMin"
        Type "response_time"
        Instance "render_png_time_min"
    </Match>
    <Match>
        Regex "Rendered PNG in ([0-9\.]+) seconds"
        DSType "GaugeMax"
        Type "response_time"
        Instance "render_png_time_max"
    </Match>
    <Match>
        Regex "Rendered PNG in ([0-9\.]+) seconds"
        DSType "GaugeAverage"
        Type "response_time"
        Instance "render_png_time_avg"
    </Match>

    # pickle (carbonlink)
    <Match>
        Regex "Total pickle rendering time ([0-9\.]+)"
        DSType "CounterInc"
        Type "counter"
        Instance "render_pickle_count"
    </Match>
    <Match>
        Regex "Total pickle rendering time ([0-9\.]+)"
        DSType "GaugeMin"
        Type "response_time"
        Instance "render_pickle_time_min"
    </Match>
    <Match>
        Regex "Total pickle rendering time ([0-9\.]+)"
        DSType "GaugeMax"
        Type "response_time"
        Instance "render_pickle_time_max"
    </Match>
    <Match>
        Regex "Total pickle rendering time ([0-9\.]+)"
        DSType "GaugeAverage"
        Type "response_time"
        Instance "render_pickle_time_avg"
    </Match>

    # rawData (json, csv, etc)
    <Match>
        Regex "Total rawData rendering time ([0-9\.]+)"
        DSType "CounterInc"
        Type "counter"
        Instance "render_rawdata_count"
    </Match>
    <Match>
        Regex "Total rawData rendering time ([0-9\.]+)"
        DSType "GaugeMin"
        Type "response_time"
        Instance "render_rawdata_time_min"
    </Match>
    <Match>
        Regex "Total rawData rendering time ([0-9\.]+)"
        DSType "GaugeMax"
        Type "response_time"
        Instance "render_rawdata_time_max"
    </Match>
    <Match>
        Regex "Total rawData rendering time ([0-9\.]+)"
        DSType "GaugeAverage"
        Type "response_time"
        Instance "render_rawdata_time_avg"
    </Match>

    # total render time
    <Match>
        Regex "Total rendering time ([0-9\.]+) seconds"
        DSType "CounterInc"
        Type "counter"
        Instance "total_render_count"
    </Match>
    <Match>
        Regex "Total rendering time ([0-9\.]+) seconds"
        DSType "GaugeMin"
        Type "response_time"
        Instance "total_render_time_min"
    </Match>
    <Match>
        Regex "Total rendering time ([0-9\.]+) seconds"
        DSType "GaugeMax"
        Type "response_time"
        Instance "total_render_time_max"
    </Match>
    <Match>
        Regex "Total rendering time ([0-9\.]+) seconds"
        DSType "GaugeAverage"
        Type "response_time"
        Instance "total_render_time_avg"
    </Match>

    # cached response time
    <Match>
        Regex "Returned cached response in ([0-9\.]+) seconds"
        DSType "CounterInc"
        Type "counter"
        Instance "cached_response_time_count"
    </Match>
    <Match>
        Regex "Returned cached response in ([0-9\.]+) seconds"
        DSType "GaugeMin"
        Type "response_time"
        Instance "cached_response_time_min"
    </Match>
    <Match>
        Regex "Returned cached response in ([0-9\.]+) seconds"
        DSType "GaugeMax"
        Type "response_time"
        Instance "cached_response_time_max"
    </Match>
    <Match>
        Regex "Returned cached response in ([0-9\.]+) seconds"
        DSType "GaugeAverage"
        Type "response_time"
        Instance "cached_response_time_avg"
    </Match>

    # data retrieval time
    <Match>
        Regex "Retrieval of [^ ]+ took ([0-9\.]+)"
        DSType "CounterInc"
        Type "counter"
        Instance "retrieval_count"
    </Match>
    <Match>
        Regex "Retrieval of [^ ]+ took ([0-9\.]+)"
        DSType "GaugeMin"
        Type "response_time"
        Instance "retrieval_time_min"
    </Match>
    <Match>
        Regex "Retrieval of [^ ]+ took ([0-9\.]+)"
        DSType "GaugeMax"
        Type "response_time"
        Instance "retrieval_time_max"
    </Match>
    <Match>
        Regex "Retrieval of [^ ]+ took ([0-9\.]+)"
        DSType "GaugeAverage"
        Type "response_time"
        Instance "retrieval_time_avg"
    </Match>
</File>

</Plugin>