• lustre_main3.png
    lustre_main3.png
  • lustre_main1.png
    lustre_main1.png
  • lustre_main2.png
    lustre_main2.png

Lustre overview panel supporting more than 1 Lustre filesystem. The main purpose of this panel is a general overview of Lustre which should be useful for 90% of the admins. We have also uploaded more advanced metrics with additional metrics and detailed jobstats dashboards. The following information is provided:

  • Number of jobs submitting any IO in the last x minutes. Being x the time configured on job_cleanup_interval.
  • Nodes connected per server as seen by connected NIDs on /exports
  • Metadata operations per filesystem
  • Metadata operations per MDS & CPU usage
  • IO bandwidth per filesystem
  • IO bandwidth per server
  • Data IOPS per filesystem
  • Data IOPS per OSS & CPU usage
  • Available capacity per target
  • Available inodes per target
  • Top10 jobstats metadata rate
  • Top10 hourly aggregated metadata
  • Top10 jobstats data rate
  • Top10 hourly aggregated data

Notes:

  • Depending on the number of job IDs and the processing capacity of prometheus the jobstats panels might time out. If this is the case try to reduce the granularity or use any other pre-processing aggregation for jobstats.

Collector Configuration Details

  1. Almost all stats are obtained with the node exporter provided by HPE: https://github.com/HewlettPackard/lustre_exporter
  2. All the remaining metrics (load, CPU, etc...) are provided by the usual node_exporter.