[Bug 1926299] Re: incorrect sensor is read on Lenovo server BMC

Dan Streetman 1926299 at bugs.launchpad.net
Fri May 21 14:46:33 UTC 2021


uploaded to b/f/g/h/i thanks!

** Description changed:

  [impact]
  
  ipmi-sensors reads the wrong sensor on a Lenovo BMC (although the
  problem may happen on other BMCs as well)
  
  [test case]
  
  run ipmi-sensors over the network using ipmi 1.5 or 2 protocol, and
  check the sensor output. Specifically for this case, the server is a
  Lenovo SR665 with only 1 cpu socket populated, and dimms 1-16 are
  connected to cpu 1 while dimms 17-32 are connected to cpu2, so dimms
  17-32 should not return any values.
  
  $ ipmi-sensors -u $USERNAME -p $PASSWD -D LAN_2_0 -l USER -h $BMCADDR | grep DIMM | grep Temp
  110 | DIMM 1 Temp | Temperature | N/A | C | N/A
  113 | DIMM 2 Temp | Temperature | N/A | C | N/A
  116 | DIMM 3 Temp | Temperature | N/A | C | N/A
  119 | DIMM 4 Temp | Temperature | N/A | C | N/A
  122 | DIMM 5 Temp | Temperature | N/A | C | N/A
  125 | DIMM 6 Temp | Temperature | N/A | C | N/A
  128 | DIMM 7 Temp | Temperature | N/A | C | N/A
  131 | DIMM 8 Temp | Temperature | N/A | C | N/A
  134 | DIMM 9 Temp | Temperature | 52.00 | C | 'OK'
  137 | DIMM 10 Temp | Temperature | 42.00 | C | 'OK'
  140 | DIMM 11 Temp | Temperature | N/A | C | N/A
  143 | DIMM 12 Temp | Temperature | N/A | C | N/A
  146 | DIMM 13 Temp | Temperature | 27.00 | C | 'OK'
  149 | DIMM 14 Temp | Temperature | 26.00 | C | 'OK'
  152 | DIMM 15 Temp | Temperature | 24.00 | C | 'OK'
  155 | DIMM 16 Temp | Temperature | 24.00 | C | 'OK'
  158 | DIMM 17 Temp | Temperature | 218.00 | C | 'OK'
  161 | DIMM 18 Temp | Temperature | 212.00 | C | 'OK'
  165 | DIMM 19 Temp | Temperature | 212.00 | C | 'OK'
  173 | DIMM 20 Temp | Temperature | N/A | C | N/A
  176 | DIMM 21 Temp | Temperature | N/A | C | N/A
  179 | DIMM 22 Temp | Temperature | N/A | C | N/A
  182 | DIMM 23 Temp | Temperature | N/A | C | N/A
  185 | DIMM 24 Temp | Temperature | N/A | C | N/A
  188 | DIMM 25 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  191 | DIMM 26 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  194 | DIMM 27 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  197 | DIMM 28 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  200 | DIMM 29 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  203 | DIMM 30 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  206 | DIMM 31 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  209 | DIMM 32 Temp | Temperature | N/A | C | N/A
  
  [regression potential]
  
  any regression would likely result in incorrect values shown for
  some/all sensor readings
  
  [scope]
  
- this is not fixed upstream, so it needed in all releases
+ this was recently fixed upstream, so it needed in all releases
+ https://github.com/chu11/freeipmi-mirror/pull/43
+ 
+ a new upstream release was made including the fix, so debian will pick
+ it up in their next devel cycle
  
  [other info]
  
  the problem is that this particular BMC has chosen to use a non-zero
  'LUN' number for these sensors; almost all BMCs use the standard LUN
  number 0, as the spec does state in section 19.3:
  
  "Unless otherwise specified, commands that are listed as mandatory must
  be accessed via LUN 00b"
  
  However the spec does allow for implementations to use alternate LUN
  numbers, specifically section 5.4 describes how the BMC would report the
  LUN number to the requesting software, and section 7.2 clarifies that
  while LUN 00b is reserved for devices beloning to the BMC itself and 10b
  is reserved for SMS messages, LUN numbers 01b and 11b are reserved for
  'OEM' use. So it does appear valid for the BMC implementation to place
  sensor(s) under LUN 01b instead of the default 00b.
  
  The cause of the invalid sensor readings in this particular case is
  because the BMC also defines sensors with the exact same sensor id
  number, but with LUN 00b. So when freeipmi attempts to read e.g. 'DIMM
  17 Temp' sensor, it actually reads the current value for a completely
  different sensor, which (in this case) isn't even a temperature sensor,
  resulting in invalid temperature reading shown for the sensor, instead
  of showing it as N/A (since the DIMM slot isn't populated).
  
  Note that ipmitool correctly uses the alternate LUN number so this bug
  does not exist with that tool, this affects only freeipmi tooling.

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1926299

Title:
  incorrect sensor is read on Lenovo server BMC

Status in freeipmi package in Ubuntu:
  In Progress
Status in freeipmi source package in Bionic:
  In Progress
Status in freeipmi source package in Focal:
  In Progress
Status in freeipmi source package in Groovy:
  In Progress
Status in freeipmi source package in Hirsute:
  In Progress
Status in freeipmi source package in Impish:
  In Progress

Bug description:
  [impact]

  ipmi-sensors reads the wrong sensor on a Lenovo BMC (although the
  problem may happen on other BMCs as well)

  [test case]

  run ipmi-sensors over the network using ipmi 1.5 or 2 protocol, and
  check the sensor output. Specifically for this case, the server is a
  Lenovo SR665 with only 1 cpu socket populated, and dimms 1-16 are
  connected to cpu 1 while dimms 17-32 are connected to cpu2, so dimms
  17-32 should not return any values.

  $ ipmi-sensors -u $USERNAME -p $PASSWD -D LAN_2_0 -l USER -h $BMCADDR | grep DIMM | grep Temp
  110 | DIMM 1 Temp | Temperature | N/A | C | N/A
  113 | DIMM 2 Temp | Temperature | N/A | C | N/A
  116 | DIMM 3 Temp | Temperature | N/A | C | N/A
  119 | DIMM 4 Temp | Temperature | N/A | C | N/A
  122 | DIMM 5 Temp | Temperature | N/A | C | N/A
  125 | DIMM 6 Temp | Temperature | N/A | C | N/A
  128 | DIMM 7 Temp | Temperature | N/A | C | N/A
  131 | DIMM 8 Temp | Temperature | N/A | C | N/A
  134 | DIMM 9 Temp | Temperature | 52.00 | C | 'OK'
  137 | DIMM 10 Temp | Temperature | 42.00 | C | 'OK'
  140 | DIMM 11 Temp | Temperature | N/A | C | N/A
  143 | DIMM 12 Temp | Temperature | N/A | C | N/A
  146 | DIMM 13 Temp | Temperature | 27.00 | C | 'OK'
  149 | DIMM 14 Temp | Temperature | 26.00 | C | 'OK'
  152 | DIMM 15 Temp | Temperature | 24.00 | C | 'OK'
  155 | DIMM 16 Temp | Temperature | 24.00 | C | 'OK'
  158 | DIMM 17 Temp | Temperature | 218.00 | C | 'OK'
  161 | DIMM 18 Temp | Temperature | 212.00 | C | 'OK'
  165 | DIMM 19 Temp | Temperature | 212.00 | C | 'OK'
  173 | DIMM 20 Temp | Temperature | N/A | C | N/A
  176 | DIMM 21 Temp | Temperature | N/A | C | N/A
  179 | DIMM 22 Temp | Temperature | N/A | C | N/A
  182 | DIMM 23 Temp | Temperature | N/A | C | N/A
  185 | DIMM 24 Temp | Temperature | N/A | C | N/A
  188 | DIMM 25 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  191 | DIMM 26 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  194 | DIMM 27 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  197 | DIMM 28 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  200 | DIMM 29 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  203 | DIMM 30 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  206 | DIMM 31 Temp | Temperature | 0.00 | C | 'At or Below (<=) Lower Non-Critical Threshold'
  209 | DIMM 32 Temp | Temperature | N/A | C | N/A

  [regression potential]

  any regression would likely result in incorrect values shown for
  some/all sensor readings

  [scope]

  this was recently fixed upstream, so it needed in all releases
  https://github.com/chu11/freeipmi-mirror/pull/43

  a new upstream release was made including the fix, so debian will pick
  it up in their next devel cycle

  [other info]

  the problem is that this particular BMC has chosen to use a non-zero
  'LUN' number for these sensors; almost all BMCs use the standard LUN
  number 0, as the spec does state in section 19.3:

  "Unless otherwise specified, commands that are listed as mandatory
  must be accessed via LUN 00b"

  However the spec does allow for implementations to use alternate LUN
  numbers, specifically section 5.4 describes how the BMC would report
  the LUN number to the requesting software, and section 7.2 clarifies
  that while LUN 00b is reserved for devices beloning to the BMC itself
  and 10b is reserved for SMS messages, LUN numbers 01b and 11b are
  reserved for 'OEM' use. So it does appear valid for the BMC
  implementation to place sensor(s) under LUN 01b instead of the default
  00b.

  The cause of the invalid sensor readings in this particular case is
  because the BMC also defines sensors with the exact same sensor id
  number, but with LUN 00b. So when freeipmi attempts to read e.g. 'DIMM
  17 Temp' sensor, it actually reads the current value for a completely
  different sensor, which (in this case) isn't even a temperature
  sensor, resulting in invalid temperature reading shown for the sensor,
  instead of showing it as N/A (since the DIMM slot isn't populated).

  Note that ipmitool correctly uses the alternate LUN number so this bug
  does not exist with that tool, this affects only freeipmi tooling.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/freeipmi/+bug/1926299/+subscriptions



More information about the Ubuntu-sponsors mailing list