Better NVIDIA DCGM Dashboard

This dashboard displays metrics from a DCGM Exporter on a Kubernetes cluster.

Better NVIDIA DCGM Dashboard screenshot 1

This dashboard is based on the original DCGM-Exporter dashboard by NVIDIA, but comes with an improved layout and a few additional visualizations.

Changes over upstream dashboard

  • Better layout, thinner lines
  • Uses the Hostname label for the host variable instead of instance
  • Legend labels are prefixed with hostnames
  • Displays cumulative energy draw over last 1h and last 24h
  • Larger range on total GPU power gauge (you should adjust this to your total max wattage)
  • Displays GPU memory usage as percentage in addition to absolute values
  • Power, GPU, and memory utilization graphs use stacked y axes
Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies