GPU monitoring

Monitors Kubernetes cluster using Prometheus. Shows overall cluster CPU / Memory / Filesystem usage as well as individual pod, containers, systemd services statistics. Uses cAdvisor metrics only.

cadvisor collects the usage information of GPU If you want to collect the GPU temperature or power information, please call the nvidia nvml libraray with node-exporter additionally

Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies