[Bug 1943765] Re: ipmitool "timing" flags are not working as expected causing failure to manage power of baremetal nodes

Hemanth Nakkina 1943765 at bugs.launchpad.net
Thu Sep 23 05:11:10 UTC 2021


I have verified ironic-conductor ipmitool commands behaviour with the
above PPA in #2 (On focal ussuri)


With configuration use_ipmitool_retries = False, ironic-conductor runs below command until 60 seconds timeout expiry.
Command: ipmitool -I lanplus -H 10.5.0.5:9999 -L ADMINISTRATOR -U test -R 1 -N 5 -f /tmp/tmpmt5292he power status

OpenStack commands used for testing:
/snap/bin/openstack baremetal node create --driver ipmi --driver-info ipmi_address=10.5.0.5:9999 --driver-info ipmi_username=test --driver-info ipmi_password=test
/snap/bin/openstack baremetal node list
/snap/bin/openstack baremetal node power on <node id>

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ironic in Ubuntu.
https://bugs.launchpad.net/bugs/1943765

Title:
  ipmitool "timing" flags are not working as expected causing failure to
  manage power of baremetal nodes

Status in charm-ironic-conductor:
  New
Status in ironic package in Ubuntu:
  New

Bug description:
  In a focal-ussuri cloud environment where there is some amount of
  packet loss between the ironic-conductor and the BMC network, I'm
  experiencing random timeout issues with ipmitool failures.

  The root issue I'm having is that using:

  ipmitool -R 12 -N 5 <command>

  is resulting in ipmitool hanging for 60 seconds (12 commands are sent
  even though the session is never properly started) and then timing out
  within the ironic-conductor application, causing "clean failed" state
  when transitioning a node from 'manage' to 'provide' status.

  Ultimately, it appears that ussuri runs this bit of code that
  determines that ipmitool accepts -R and -N flags and instead of
  performing retries of ipmitool within the ironic code, it relies on
  ipmitool to perform all of the retries.

  https://opendev.org/openstack/ironic/src/branch/stable/ussuri/ironic/drivers/modules/ipmitool.py#L538-L546

  This has been addressed in the mainline code by the addition of an
  operator configurable option 'use_ipmitool_retries' to let ipmitool
  perform retries via -R flag, or fall back to letting ironic execute
  ipmitool multiple separate times.

  https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/ipmitool.py#L494

  In my environment, I require to re-run ipmitool multiple separate
  times to avoid failure.

  Can we please backport this functionality into focal-ussuri?
  https://opendev.org/openstack/ironic/commit/1de3db3b16f3e0475e506e540ca5d5ed6edb4cbf

  Also, please expose charm configuration to allow operator to set
  "[ipmi] use_ipmitool_retries" = False.

To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ironic-conductor/+bug/1943765/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list