Better NVIDIA DCGM Dashboard
This dashboard displays metrics from a DCGM Exporter on a Kubernetes cluster.
This dashboard is based on the original DCGM-Exporter dashboard by NVIDIA, but comes with an improved layout and a few additional visualizations.
Changes over upstream dashboard
- Better layout, thinner lines
- Uses the
Hostname
label for the host variable instead ofinstance
- Legend labels are prefixed with hostnames
- Displays cumulative energy draw over last 1h and last 24h
- Larger range on total GPU power gauge (you should adjust this to your total max wattage)
- Displays GPU memory usage as percentage in addition to absolute values
- Power, GPU, and memory utilization graphs use stacked y axes
Data source config
Collector config:
Upload an updated version of an exported dashboard.json file from Grafana
Revision | Description | Created | |
---|---|---|---|
Download |