Gridware Cluster Scheduler

Demo Dashboard for Gridware Cluster Scheduler (formerly Sun Grid Engine / SGE)

Gridware Cluster Scheduler screenshot 1

This Grafana dashboard provides comprehensive, real-time insights into HPC and AI environments managed by the Gridware Cluster Scheduler (formerly known as Sun Grid Engine). The Gridware Cluster Scheduler is a robust, high-performance scheduler designed to manage complex computing workloads at scale, supporting both single-node and multi-node jobs, as well as diverse resource types such as CPUs, GPUs, and customizable numerical resources.

Leveraging the integrated qtelemetry tool, cluster administrators gain access to a wide range of performance metrics that are exported in a Prometheus-compatible format. These include:

  • Job Overview: Detailed statistics of current workloads, number of jobs per state, and workload distribution. Compute Node Overview: Essential node-specific metrics such as memory usage, CPU load, active jobs, and resource allocation statuses.
  • GPU Resource Overview: Real-time monitoring of GPU resources covering aspects like temperature, memory utilization, performance metrics, and error rates.
  • Cluster Queue Details: Insights into job waiting times, queue lengths, and overall workload management efficiency. Custom Numerical Resources (Complexes): Tracking and evaluation of user-defined numeric resources relevant to particular workloads or configurations.

The provided Grafana dashboard serves as a fully-functional example, designed to offer a customizable starting point. Cluster administrators can seamlessly adapt and expand this dashboard according to their specific monitoring and organizational needs.

This dashboard showcases functionality provided by the Gridware Cluster Scheduler, presented by HPC Gridware.

Please note: The freely available Open Cluster Scheduler) version (SGE-compatible) does not include the qtelemetry tool.

Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies