OnDemand Clusters

Open OnDemand Clusters dashboard

This dashboard pulls data provided by several Prometheus exporters:

All metrics expect to have a host label. This is an example of a relabel configuration in Prometheus to assign all instances the host label:

    relabel_configs:
      - source_labels: '[__address__]'
        regex: '([^.]+)..*'
        replacement: '$1'
        target_label: host

Record rules used for CPU and network panels:

groups:
- name: node
  rules:
  - record: node:cpus:count
    expr: count by(host,cluster,role) (node_cpu_info)
  - record: node:cpu_load_user:avg5m
    expr: avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="user"}[5m]))
  - record: node:cpu_load_system:avg5m
    expr: avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="system"}[5m]))
  - record: node:cpu_load_iowait:avg5m
    expr: avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="iowait"}[5m]))
  - record: node:cpu_load_total:avg5m
    expr: 1 - avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="idle"}[5m]))
  - record: node:network_received_rate_bytes
    expr: irate(node_network_receive_bytes_total[5m])
  - record: node:network_transmit_rate_bytes
    expr: irate(node_network_transmit_bytes_total[5m])

Record rules for cgroup related panels:

groups:
- name: cgroup
  rules:
  - record: cgroup:cpu_user_seconds:irate5m
    expr: (irate(cgroup_cpu_user_seconds[5m]) / cgroup_cpus) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
  - record: cgroup:cpu_system_seconds:irate5m
    expr: (irate(cgroup_cpu_system_seconds[5m]) / cgroup_cpus) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
  - record: cgroup:cpu_total_seconds:irate5m
    expr: (irate(cgroup_cpu_total_seconds[5m]) / cgroup_cpus) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
  - record: cgroup:memory_used_bytes
    expr: cgroup_memory_used_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
  - record: cgroup:memory_total_bytes
    expr: cgroup_memory_total_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
  - record: cgroup:memory_rss_bytes
    expr: cgroup_memory_rss_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
  - record: cgroup:memory_cache_bytes
    expr: cgroup_memory_cache_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
  - record: cgroup:swap_used_bytes
    expr: (cgroup_memsw_used_bytes - cgroup_memory_used_bytes) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
Revisions
RevisionDescriptionCreated

Get this dashboard

Import the dashboard template

or

Download JSON

Datasource
Dependencies