GPU Cluster Monitoring

Dashboard for monitoring GPU cluster featuring DCGM and Slurm metrics

The GPU Cluster Monitoring dashboard uses the loki and prometheus data sources to create a Grafana dashboard with the alertlist, logs, stat, table, text and timeseries panels.
Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies