hami-vgpu-dashboard
This dashboard is gpu metrics dashboard base on NVIDIA DCGM Exporter and HAMi/k8s-vgpu-scheduler
HAMI vgpu dashboard
- deploy hami
- deploy dcgm-exporter
# This dashboard also includes some NVIDIA DCGM metrics
kubectl create -f https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/master/dcgm-exporter.yaml
- add prometheus custom metric configuration
- job_name: 'kubernetes-hami-exporter'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: hami-.*
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_pod_node_name]
regex: (.*)
target_label: node_name
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_pod_host_ip]
regex: (.*)
target_label: ip
replacement: $1
action: replace
- job_name: 'kubernetes-dcgm-exporter'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_endpoints_name]
regex: dcgm-exporter
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_pod_node_name]
regex: (.*)
target_label: node_name
replacement: ${1}
action: replace
- source_labels: [__meta_kubernetes_pod_host_ip]
regex: (.*)
target_label: ip
replacement: $1
action: replace
- reload promethues:
curl -XPOST http://{promethuesServer}:{port}/-/reload
Data source config
Collector config:
Upload an updated version of an exported dashboard.json file from Grafana
Revision | Description | Created | |
---|---|---|---|
Download |