GPU monitoring
Monitors Kubernetes cluster using Prometheus. Shows overall cluster CPU / Memory / Filesystem usage as well as individual pod, containers, systemd services statistics. Uses cAdvisor metrics only.
cadvisor collects the usage information of GPU If you want to collect the GPU temperature or power information, please call the nvidia nvml libraray with node-exporter additionally
Data source config
Collector config:
Upload an updated version of an exported dashboard.json file from Grafana
Revision | Description | Created | |
---|---|---|---|
Download |