Slurm Cgroups

Metrics collected by the cgroups_exporter.

This dashboard visualizes data collected by the Prometheus cgroups_exporter running in Slurm mode on an High Performance Computing (HPC) cluster that utilizes the Slurm scheduler configured with TaskPlugin=task/cgroup. The dashboard visualizes per-node metrics at the job-level for jobs that run across multiple nodes in a cluster.

Metrics include:

Per-CPU usage: When there are multiple jobs running on a single node, this dashboard visualizes only CPUs utilized by a given job on that node.
Total CPU usage: Shows the total CPU utilization for only those CPUs scheduled to your job on each node.
Memory usage: Likewise, this dashboard visualizes the memory utilized by only a given job, even if multiple jobs ran on the same node and utilized memory.

Revisions

Revision	Description	Created
			Download

Get this dashboard

Import the dashboard template

Download JSON

Datasource

Dependencies

Resources

Docs: Importing dashboards Webinar: Getting started with Grafana dashboard design Webinar: Building advanced Grafana dashboards

Slurm Cgroups

Data source config

Collector config:

Get this dashboard

Still have questions?

Get every update