hami-vgpu-dashboard

This dashboard is gpu metrics dashboard base on NVIDIA DCGM Exporter and HAMi/k8s-vgpu-scheduler

hami-vgpu-dashboard screenshot 1
hami-vgpu-dashboard screenshot 2
hami-vgpu-dashboard screenshot 3
hami-vgpu-dashboard screenshot 4
hami-vgpu-dashboard screenshot 5

HAMI vgpu dashboard

bash
# This dashboard also includes some NVIDIA DCGM metrics
kubectl create -f https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/master/dcgm-exporter.yaml
  • add prometheus custom metric configuration
yaml
- job_name: 'kubernetes-hami-exporter'
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_endpoints_name]
      regex: hami-.*
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_pod_node_name]
      regex: (.*)
      target_label: node_name
      replacement: ${1}
      action: replace
    - source_labels: [__meta_kubernetes_pod_host_ip]
      regex: (.*)
      target_label: ip
      replacement: $1
      action: replace
- job_name: 'kubernetes-dcgm-exporter'
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - source_labels: [__meta_kubernetes_endpoints_name]
      regex: dcgm-exporter
      replacement: $1
      action: keep
    - source_labels: [__meta_kubernetes_pod_node_name]
      regex: (.*)
      target_label: node_name
      replacement: ${1}
      action: replace
    - source_labels: [__meta_kubernetes_pod_host_ip]
      regex: (.*)
      target_label: ip
      replacement: $1
      action: replace
  • reload promethues:
bash
curl -XPOST http://{promethuesServer}:{port}/-/reload
Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies