[Bug 1692837] Re: Ubuntu 16.04.02: powerpc-ibm-utils: drmgr does not scale with large number of virtual adapters

Launchpad Bug Tracker 1692837 at bugs.launchpad.net
Tue Jul 4 12:12:17 UTC 2017


This bug was fixed in the package powerpc-utils - 1.3.1-2ubuntu0.3

---------------
powerpc-utils (1.3.1-2ubuntu0.3) xenial; urgency=medium

  * d/p/Improve-perf-of-drmgr-lsslot-with-large-num-of-virt.patch:
    Fix scaling with large number of virtual adapters.  LP: #1692837
  * d/p/drmgr-Stale-errno-usage-corrections.patch,
    d/p/drmgr-Correct-errno-usage-use-in-validate_paltform.patch,
    d/p/drmgr-Correct-errno-usage-in-init_cpu_info.patch:
    Fix failures during scale-up test on Novalink System.  LP: #1696434

 -- Breno Leitao <leitao at debian.org>  Fri, 09 Jun 2017 10:39:15 -0400

** Changed in: powerpc-utils (Ubuntu Xenial)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to powerpc-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1692837

Title:
  Ubuntu 16.04.02: powerpc-ibm-utils: drmgr does not scale with large
  number of virtual adapters

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in powerpc-utils package in Ubuntu:
  Fix Released
Status in powerpc-utils source package in Xenial:
  Fix Released
Status in powerpc-utils source package in Yakkety:
  Fix Committed
Status in powerpc-utils source package in Zesty:
  Fix Committed

Bug description:
  [SRU Justification]
  On a NovaLink system, the time drmgr takes to complete increases linearly with the number of virtual adapters.  This is unreasonable.

  [Test case]
  To be completed by IBM, who have access to the hardware.
  1. Add 200 virtual adapters to an LPAR
  2. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q
  3. Confirm that this takes multiple seconds to return.
  4. Install powerpc-utils from -proposed.
  5. Run time /usr/sbin/drmgr -c slot -s <slot_name> -Q again
  6. Confirm that this takes less than a second to return.

  [Regression potential]
  Any bugs introduced in this code could cause drmgr/lsslot to fail to correctly operate at all on the slots.  However, the code is reasonably generic and the risk is low of this code failing intermittently: if it passes verification it's reasonable to expect it will work everywhere.

  == Comment: #0 - Jeremy A. Arnold
  ---Problem Description---
  The time to run commands such as "drmgr -c slot -s U8247.22L.211E15A-V1-C210 -a -w 1" increases linearly with the number of slots on the system.  For cloud environments with a large number of VMs hosted by a single NovaLink partition (or a small number of VIOS partitions), the number of virtual slots in the NovaLink partition can grow large, and the long drmgr time can be a major factor in the time to deploy new VMs.

  In one recent test, the call above took about 29 seconds to complete
  on a system with around 100 VMs.  Earlier in the run (when there was
  less than 10 VMs) it only took about 2 seconds.

  I'm not at all an expert on this area, but it would appear that drmgr
  is iterating through all of the slots in order to find the one that
  was requested.  Some evidence for this is provided by running:

  sudo time /usr/sbin/drmgr -c slot -s U8247.22L.211E15A-V1-C210 -Q

  This is taking about 14 seconds elapsed time (it may have been slower
  during the actual run due to concurrent executions of drmgr) on a
  system with about 232 virtual adapter slots.  The time is similar if I
  make a request for a slot that does not exist (e.g. change C210 to
  C250), so it would appear that nearly all of the runtime is for
  looking up the slot and not for actually retrieving information about
  it.

  Adding -d20 to the above command provides additional debug data.  This
  shows that the majority of the time is between these two lines of
  output:

  ---
  Could not find DRC property group in path: /proc/device-tree/pci at 80000002000001f.
  DR nodes list
  ---

  For reference, the following command can identify the correct entry in
  the device tree in about 0.02 seconds.  Obviously drmgr has more to do
  than just this, but this suggests that there is no fundamental reason
  the time has to scale with the number of slots:

  time (find /sys/firmware/devicetree/base/vdevice -name "ibm\,loc-code"
  | xargs grep "^U8247.22L.211E15A-V1-C210-T1$")

  ---uname output---
  Linux cs-tul10-neo 4.4.21-customv1.29 #6 SMP Wed Apr 12 14:40:02 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = 8247-22L

  ---Debugger---
  A debugger is not configured

  ---Steps to Reproduce---
   I used PowerVC to deploy 100 VMs to a NovaLink system and viewed /var/log/drmgr to observe how long the drmgr calls took during the test.

  I believe it would be sufficient to add 200 virtual adapters to an
  LPAR and then run "/usr/sbin/drmgr -c slot -s <slot_name> -Q"

  I'm happy to collect additional data in my environment if it would be
  helpful.

  Userspace tool common name: /usr/sbin/drmgr

  The userspace tool has the following bit modes: 64-bit

  Userspace rpm: powerpc-ibm-utils

  Userspace tool obtained from project website:  na

  == Comment: #4 - Amartey S. Pearson
  I have a proposed fix for this in a github fork.  In short, the algorithm used to populate the dr_nodes needs to be fixed.  It currently walks the entire bus for every theoretical DRC (1000's of times).  The fix is to walk the bus once.

  https://github.com/apearson-ibm/powerpc-
  utils/commit/6fefb6acb6fb302c97d71faef75a12674a50209a

  This addresses both drmgr and lsslot as the change is in common code.
  An example of the improvement we see:

  Here we have a system with 196 populated virtual slots.  An lsslot
  takes 6.5 seconds.

  root at neo33-2:/usr/sbin# time /usr/sbin/lsslot -c slot | wc -l
  196
  real    0m6.495s
  user    0m1.108s
  sys 0m5.384s

  With the fix, the lsslot now takes 0.18 seconds, and scales well as
  more slots are added.

  root at neo33-2:~/powerpc-utils# time /usr/local/sbin/lsslot -c slot | wc -l
  196
  real    0m0.186s
  user    0m0.028s
  sys 0m0.156s

  == Comment: #7 - Anna A. Sortland
  We tried the patch in our test environment and it worked great.

  == Comment: #11 - Nathan D. Fontenot <nfonteno at us.ibm.com> - 2017-05-18 13:30:42 ==
  Patch submitted upstream.

  https://groups.google.com/forum/#!topic/powerpc-utils-
  devel/sd1gdvbQp0w

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1692837/+subscriptions



More information about the foundations-bugs mailing list