[Bug 1696434] Please test proposed package
Brian Murray
brian at ubuntu.com
Tue Jun 20 00:18:51 UTC 2017
Hello bugproxy, or anyone else affected,
Accepted powerpc-utils into xenial-proposed. The package will build now
and be available at https://launchpad.net/ubuntu/+source/powerpc-
utils/1.3.1-2ubuntu0.3 in a few hours, and then in the -proposed
repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, and change the tag
from verification-needed to verification-done. If it does not fix the
bug for you, please add a comment stating that, and change the tag to
verification-failed. In either case, details of your testing will help
us make a better decision.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance!
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to powerpc-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1696434
Title:
drmgr command fails during the scale-up test on Novalink System
(Brazos)
Status in The Ubuntu-power-systems project:
Confirmed
Status in powerpc-utils package in Ubuntu:
Fix Released
Status in powerpc-utils source package in Xenial:
Fix Committed
Status in powerpc-utils source package in Yakkety:
Fix Committed
Status in powerpc-utils source package in Zesty:
Fix Committed
Bug description:
[SRU Justification]
drmgr fails intermittently when adding devices to the system.
[Test case]
To be completed by IBM, who have access to the hardware.
1. Run a scale test of launching 1000 VMs on a Novalink system.
2. Observe that some of the deployments fail with the following error:
kernel I/O op failed, rc = 26 len = 26.
3. Install powerpc-utils from -proposed
4. Run the scale test again.
5. Observe that all the deployments succeed.
[Regression potential]
This change cherry-picked from upstream corrects faulty handling of a 0 return code from syscalls. Regression potential appears to be minimal.
Problem:
During the scale-up test to 1000 VMs I could see 20 deploys failed due
to following command failure..
Command /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3 returned 19. Additional messages: /usr/sbin/pvmdrmgr drmgr -c slot -s 'U9119.MHE.1085B07-V1-C1030' -a -w 3
Validating I/O DLPAR capability...yes.
kernel I/O op failed, rc = 26 len = 26.
I have been looking through the logs on this system to piece together
what is happening when the dlpar add failures occur. From what I am
seeing we are trying to dlpar add a virtual network device and getting
a error when trying to add the device to the system.
> ########## May 17 05:18:00 2017 ##########
> drmgr: -c slot -s U9119.MHE.1085B07-V1-C1030 -a -w 3
> Validating I/O DLPAR capability...yes.
> Getting node types 0x00000003
> Could not find DRC property group in path: /proc/device-tree/ibm,serial.
> Acquiring drc index 0x30000406
> get-sensor for 30000406: 0, 2
> Setting allocation state to 'alloc usable'
> Setting indicator state to 'unisolate'
> Configuring connector for drc index 30000406
> Adding device-tree node /proc/device-tree/vdevice/l-lan at 30000406
> ofdt update: add_node /vdevice/l-lan at 30000406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1
> Getting node types 0x00000003
> performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot
> kernel I/O op failed, rc = 26 len = 26.
> No such device
> Releasing drc index 0x30000406
> get-sensor for 30000406: 0, 1
> Setting isolation state to 'isolate'
> Setting allocation state to 'alloc unusable'
> get-sensor for 30000406: 0, 2
> drc_index 30000406 sensor-state: 2
> Resource is not available to the partition.
> Removing device-tree node /proc/device-tree/vdevice/l-lan at 30000406
> ########## May 17 05:20:11 2017 ##########
From the drmgr log, you can see that we get a ENODEV return code when
performing the kernel operation to add the device to the system.
> performing kernel op for U9119.MHE.1085B07-V1-C1030, file is /sys/bus/pci/slots/control/add_slot
> kernel I/O op failed, rc = 26 len = 26.
> No such device
This indicates that the rpadlpar_io kernel modules was unable to find
the device in the device tree. This doesn not seem right because
earlier in the drmgr logs we add the device to the device tree.
Additionally, the drmgr code validates that the add succeeds by
retrieveing the newly added device node from the device tree as a
sanity check. There are no failures reported for this.
> Adding device-tree node /proc/device-tree/vdevice/l-lan at 30000406
> ofdt update: add_node /vdevice/l-lan at 30000406 ibm,loc-code 30 U9119.MHE.1085B07-V1-C1030-T1
> Getting node types 0x00000003
I started scale-up testing and I could see deploys are going fine.
Will post a comment here if I see further drmgr failures.
Patches have been submitted upstream.
https://groups.google.com/forum/#!topic/powerpc-utils-
devel/GNEi65WBwkQ
and
https://groups.google.com/forum/#!topic/powerpc-utils-
devel/hJfUb5wYPsE
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1696434/+subscriptions
More information about the foundations-bugs
mailing list