Help build the future of open source observability software Open positions

Check out the open source projects we support Downloads

OnDemand Clusters

Dashboard

Open OnDemand Clusters dashboard
Last updated: 10 months ago

Start with Grafana Cloud and the new FREE tier. Includes 10K series Prometheus or Graphite Metrics and 50gb Loki Logs

Downloads: 60

Reviews: 1

    This dashboard pulls data provided by several Prometheus exporters:

    All metrics expect to have a host label. This is an example of a relabel configuration in Prometheus to assign all instances the host label:

        relabel_configs:
          - source_labels: '[__address__]'
            regex: '([^.]+)..*'
            replacement: '$1'
            target_label: host
    

    Record rules used for CPU and network panels:

    groups:
    - name: node
      rules:
      - record: node:cpus:count
        expr: count by(host,cluster,role) (node_cpu_info)
      - record: node:cpu_load_user:avg5m
        expr: avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="user"}[5m]))
      - record: node:cpu_load_system:avg5m
        expr: avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="system"}[5m]))
      - record: node:cpu_load_iowait:avg5m
        expr: avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="iowait"}[5m]))
      - record: node:cpu_load_total:avg5m
        expr: 1 - avg by (host,cluster,role)(irate(node_cpu_seconds_total{mode="idle"}[5m]))
      - record: node:network_received_rate_bytes
        expr: irate(node_network_receive_bytes_total[5m])
      - record: node:network_transmit_rate_bytes
        expr: irate(node_network_transmit_bytes_total[5m])
    

    Record rules for cgroup related panels:

    groups:
    - name: cgroup
      rules:
      - record: cgroup:cpu_user_seconds:irate5m
        expr: (irate(cgroup_cpu_user_seconds[5m]) / cgroup_cpus) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
      - record: cgroup:cpu_system_seconds:irate5m
        expr: (irate(cgroup_cpu_system_seconds[5m]) / cgroup_cpus) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
      - record: cgroup:cpu_total_seconds:irate5m
        expr: (irate(cgroup_cpu_total_seconds[5m]) / cgroup_cpus) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
      - record: cgroup:memory_used_bytes
        expr: cgroup_memory_used_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
      - record: cgroup:memory_total_bytes
        expr: cgroup_memory_total_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
      - record: cgroup:memory_rss_bytes
        expr: cgroup_memory_rss_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
      - record: cgroup:memory_cache_bytes
        expr: cgroup_memory_cache_bytes * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
      - record: cgroup:swap_used_bytes
        expr: (cgroup_memsw_used_bytes - cgroup_memory_used_bytes) * on(cgroup, host) group_left(jobid,uid,username) cgroup_info
    
    Get this dashboard:
    12093
    Dependencies: