Slurm Cgroups

Metrics collected by the cgroups_exporter.

This dashboard visualizes data collected by the Prometheus cgroups_exporter running in Slurm mode on an High Performance Computing (HPC) cluster that utilizes the Slurm scheduler configured with TaskPlugin=task/cgroup. The dashboard visualizes per-node metrics at the job-level for jobs that run across multiple nodes in a cluster.

Metrics include:

Per-CPU usage: When there are multiple jobs running on a single node, this dashboard visualizes only CPUs utilized by a given job on that node.
Total CPU usage: Shows the total CPU utilization for only those CPUs scheduled to your job on each node.
Memory usage: Likewise, this dashboard visualizes the memory utilized by only a given job, even if multiple jobs ran on the same node and utilized memory.

Revisions

Revision	Description	Created
			Download

Get this dashboard

Import the dashboard template

Download JSON

Resources

Docs: Importing dashboards Webinar: Getting started with Grafana dashboard design Webinar: Building advanced Grafana dashboards

Datasource

Dependencies

Feedback

Slurm Cgroups

Data source config

Collector config:

Get this dashboard