iDRAC - Host Stats

Dashboard

SNMP Based Dashboard to Monitor Dell Hosts via iDRAC - by ilovepancakes95
Last updated: 6 months ago

Downloads: 4549

Reviews: 4

  • screenshot_full1.png
    screenshot_full1.png
  • screenshot_full2.png
    screenshot_full2.png
  • screenshot_middle.png
    screenshot_middle.png
  • screenshot_top.png
    screenshot_top.png

Ultimate iDRAC Grafana Dashboard (Telegraf SNMP Based)

Github Repository: LINK
SNMP Based Dashboard to Monitor Dell Hosts via iDRAC
Grafana Dashboard ID: 12106

Screenshot 1

How To Use

Enable SNMPv1 in the iDRACs you wish to monitor. Install and setup Telegraf, InfluxDB, and Grafana to work with eachother. Use the provided idrac-input.conf file and replace the values for "idracURLx" under "agent" with your own iDRAC IPs or hostnames. Restart Telegraf. Then, import the dashboard json file (or use Grafana Dashboard ID) to add the dashboard and panels to Grafana, selecting your own InfluxDB database after clicking "Import". Data may take up to 2 minutes to fully populate the first time. Enjoy!

Features

  • Uses Grafana variables to dynamically pull in all iDRACs listed in the Telegraf config file, and draw a new "row" section for each iDRAC that gets added.
  • Displays summary table and global status "heat" map of all iDRACs being polled.
  • Summary table pulls in each iDRAC URL so clicking System Name in the table brings you directly to that iDRAC's logon page.
  • Panels and table cells change color to indicate failures or other status messages.
  • Variable selection box allows fine-tuning of which systems are displayed on the dashboard. (Default is "All").
  • Each system's section on the dashboard is dynamically drawn based on variable selection to show the following data for each host:
    • Uptime, Global Status, Power State, PSU Status, CMOS Battery Status, RAID Battery Status, Storage Status, RAM Status, & Thermal Status
    • Service Tag, BIOS Version, and Intrusion Sensor Status
    • OS Name and OS Version Table
    • System Power (in watts) Graph
    • CPU Temp Graph
    • System Air Temp Graph
    • Fan Speed Graph
    • Physical Disk Status Table (Disk Name, Capacity, Status, Predictive Fail Alarm)
    • System Log Entries Table
    • Network Interfaces Table (NIC Name, Vendor, Status, MAC Address)
  • Adding more data is as simple as adding the appropriate iDRAC OID to the Telegraf config file, and adding a panel to display the new data on the dashboard.

Full Screenshot

This screenshot shows the full dashboard with 3 systems being monitored and displayed.

Screenshot 2

Build Environment

  • Grafana for visualization of data
  • Telegraf for data collection with SNMP Input Plugin
  • InfluxDB for time series and table data storage
  • iDRAC with SNMP Enabled (v1) - Tested with iDRAC 7 and iDRAC 8 on Dell Poweredge r720xd and r730xd servers.

Adding More Data and Panels

iDRAC has the capability to display a TON of data through SNMP and it's easy to expand this dashboard to add more of it, per your collection and monitoring needs. I used the Dell MIB Files with an MIB Browser and the Dell EMC OpenManage SNMP Reference Guide to figure out the OIDs.

Known Issues / Needs Work

One of the only major data related problems I could not figure out was the proper display of Date & Time for system log entries. Dell outputs a date & time stamp for each log entry in the format: 20200420173454.000000-300. This is what is displayed in the system log table panel as Grafana can't understand and re-format dates and times in this provided format it seems, to make it looks pretty, such as YYYY-MM-DD HH:MM:SS I have heard that Telegraf/InfluxDB may have a way for this data to be transformed into a better structure before it makes it to Grafana, although I have come up empty with easy or even semi-easy ways to do this. Hoping somebody else knows a fix that isn't extremely involved.
Fix implemented per @krystiancharubin regex processor in idrac-input.conf

  • Sometimes, for only a few seconds, the data in the tables repeats itself and gets out of line, even though the "group by" in the query has a limit of 1.

Collector Configuration Details

Telegraf Sample Config File

Change the "idracURL1", "idracURL2", etc. values to match your iDRAC IPs or Hostnames. Make sure SNMPv1 is enabled in iDRAC first.

[[processors.regex]]
  [[processors.regex.fields]]
    key = "log-dates"
    pattern = "^(?P<YYYY>\\d{4})(?P<MM>\\d{2})(?P<DD>\\d{2})(?P<HH>\\d{2})(?P<mm>\\d{2})(?P<ss>\\d{2})\\.(?P<SSSSSS>\\d{6})(?P<ZZ>[-+]\\d{3,4})$"
    replacement = "${YYYY}-${MM}-${DD} ${HH}:${mm}:${ss}"

[[inputs.snmp]]
  agents = [ "idracURL1:161" , "idracURL2:161" , "idracURL3:161" ]
  version = 1
  community = "public"
  name = "idrac-hosts"

  [[inputs.snmp.field]]
     name = "system-name"
     oid  = ".1.3.6.1.2.1.1.5.0"
     is_tag = true

  [[inputs.snmp.field]]
     name = "system-osname"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.6.0"

  [[inputs.snmp.field]]
     name = "system-osversion"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.14.0"

  [[inputs.snmp.field]]
     name = "system-model"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.12.0"

  [[inputs.snmp.field]]
     name = "idrac-url"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.1.6.0"

  [[inputs.snmp.field]]
     name = "power-state"
     oid  = ".1.3.6.1.4.1.674.10892.5.2.4.0"

  [[inputs.snmp.field]]
     name = "system-uptime"
     oid  = ".1.3.6.1.4.1.674.10892.5.2.5.0"

  [[inputs.snmp.field]]
     name = "system-servicetag"
     oid  = ".1.3.6.1.4.1.674.10892.5.1.3.2.0"

  [[inputs.snmp.field]]
     name = "system-globalstatus"
     oid  = ".1.3.6.1.4.1.674.10892.5.2.1.0"

  [[inputs.snmp.table]]
     name = "idrac-hosts"
     inherit_tags = [ "system-name" , "disks-name" ]

    [[inputs.snmp.table.field]]
       name = "bios-version"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.50.1.8"

    [[inputs.snmp.table.field]]
       name = "raid-batterystate"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.15.1.4"

    [[inputs.snmp.table.field]]
       name = "intrusion-sensor"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.70.1.6"

    [[inputs.snmp.table.field]]
       name = "disks-mediatype"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.35"

    [[inputs.snmp.table.field]]
       name = "disks-state"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4"

    [[inputs.snmp.table.field]]
       name = "disks-predictivefail"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.31"

    [[inputs.snmp.table.field]]
       name = "disks-capacity"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.11"

    [[inputs.snmp.table.field]]
       name = "disks-name"
       oid = ".1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.2"
       is_tag = true

    [[inputs.snmp.table.field]]
       name = "memory-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.200.10.1.27"

    [[inputs.snmp.table.field]]
       name = "storage-status"
       oid = ".1.3.6.1.4.1.674.10892.5.2.3"

    [[inputs.snmp.table.field]]
       name = "temp-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.200.10.1.63"

    [[inputs.snmp.table.field]]
       name = "psu-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.200.10.1.9"

    [[inputs.snmp.table.field]]
       name = "log-dates"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.8"

    [[inputs.snmp.table.field]]
       name = "log-entry"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.5"

    [[inputs.snmp.table.field]]
       name = "log-severity"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.7"

    [[inputs.snmp.table.field]]
       name = "log-number"
       oid = ".1.3.6.1.4.1.674.10892.5.4.300.40.1.2"
       is_tag = true

    [[inputs.snmp.table.field]]
       name = "nic-name"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.30"
       is_tag = true

    [[inputs.snmp.table.field]]
       name = "nic-vendor"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.7"

    [[inputs.snmp.table.field]]
       name = "nic-status"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.4"

    [[inputs.snmp.table.field]]
       name = "nic-current_mac"
       oid = ".1.3.6.1.4.1.674.10892.5.4.1100.90.1.15"
       conversion = "hwaddr"

  [[inputs.snmp.field]]
     name = "fan1-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.1"

  [[inputs.snmp.field]]
     name = "fan2-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.2"

  [[inputs.snmp.field]]
     name = "fan3-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.3"

  [[inputs.snmp.field]]
     name = "fan4-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.4"

  [[inputs.snmp.field]]
     name = "fan5-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.5"

  [[inputs.snmp.field]]
     name = "fan6-speed"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.12.1.6.1.6"
 
  [[inputs.snmp.field]]
     name = "inlet-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.1"

  [[inputs.snmp.field]]
     name = "exhaust-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.2"

  [[inputs.snmp.field]]
     name = "cpu1-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.3"

  [[inputs.snmp.field]]
     name = "cpu2-temp"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.700.20.1.6.1.4"

  [[inputs.snmp.field]]
     name = "cmos-batterystate"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.600.50.1.6.1.1"

  [[inputs.snmp.field]]
     name = "system-watts"
     oid  = ".1.3.6.1.4.1.674.10892.5.4.600.30.1.6.1.3"