[Bug 1915811] Re: Empty NUMA topology in machines with high number of CPUs

Victor Tapia 1915811 at bugs.launchpad.net
Wed Mar 10 16:47:27 UTC 2021


#VERIFICATION USSURI

Using the test case described in the description, where a VM has 128
vcpus assigned, the version in -updates does not list the topology:

$ dpkg -l |grep libvirt
ii  libvirt-clients                      6.0.0-0ubuntu8.7~cloud0                     amd64        Programs for the libvirt library
ii  libvirt-daemon                       6.0.0-0ubuntu8.7~cloud0                     amd64        Virtualization daemon
ii  libvirt-daemon-driver-qemu           6.0.0-0ubuntu8.7~cloud0                     amd64        Virtualization daemon QEMU connection driver
ii  libvirt-daemon-driver-storage-rbd    6.0.0-0ubuntu8.7~cloud0                     amd64        Virtualization daemon RBD storage driver
ii  libvirt-daemon-system                6.0.0-0ubuntu8.7~cloud0                     amd64        Libvirt daemon configuration files
ii  libvirt-daemon-system-systemd        6.0.0-0ubuntu8.7~cloud0                     amd64        Libvirt daemon configuration files (systemd)
ii  libvirt0:amd64                       6.0.0-0ubuntu8.7~cloud0                     amd64        library for interfacing with different virtualization systems

$ virsh capabilities | xmllint --xpath '/capabilities/host/topology' -
<topology>
      <cells num="0">
      </cells>
    </topology>

The package in -proposed fixes the issue (output shortened):

$ dpkg -l |grep libvirt
ii  libvirt-clients                      6.0.0-0ubuntu8.8~cloud0                     amd64        Programs for the libvirt library
ii  libvirt-daemon                       6.0.0-0ubuntu8.8~cloud0                     amd64        Virtualization daemon
ii  libvirt-daemon-driver-qemu           6.0.0-0ubuntu8.8~cloud0                     amd64        Virtualization daemon QEMU connection driver
ii  libvirt-daemon-driver-storage-rbd    6.0.0-0ubuntu8.8~cloud0                     amd64        Virtualization daemon RBD storage driver
ii  libvirt-daemon-system                6.0.0-0ubuntu8.8~cloud0                     amd64        Libvirt daemon configuration files
ii  libvirt-daemon-system-systemd        6.0.0-0ubuntu8.8~cloud0                     amd64        Libvirt daemon configuration files (systemd)
ii  libvirt0:amd64                       6.0.0-0ubuntu8.8~cloud0                     amd64        library for interfacing with different virtualization systems

$ virsh capabilities | xmllint --xpath '/capabilities/host/topology' -
<topology>
      <cells num="1"> 
        <cell id="0">
          <memory unit="KiB">5047560</memory>
          <pages unit="KiB" size="4">1261890</pages>
          <pages unit="KiB" size="2048">0</pages>
          <distances>
            <sibling id="0" value="10"/>
          </distances>
          <cpus num="128">
            <cpu id="0" socket_id="0" core_id="0" siblings="0"/>
            ...
            <cpu id="127" socket_id="127" core_id="0" siblings="127"/>
          </cpus>
        </cell>
      </cells>
    </topology>


** Tags removed: verification-stein-needed verification-train-needed verification-ussuri-needed
** Tags added: verification-stein-done verification-train-done verification-ussuri-done

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1915811

Title:
  Empty NUMA topology in machines with high number of CPUs

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive stein series:
  Fix Committed
Status in Ubuntu Cloud Archive train series:
  Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in libvirt package in Ubuntu:
  Fix Released
Status in libvirt source package in Xenial:
  Fix Committed
Status in libvirt source package in Bionic:
  Fix Committed
Status in libvirt source package in Focal:
  Fix Committed
Status in libvirt source package in Groovy:
  Fix Committed

Bug description:
  [impact]

  libvirt fails to populate its NUMA topology when the machine has a
  large number of CPUs assigned to a single node. This happens when the
  number of CPUs fills the bitmask (all to one), hitting a workaround
  introduced to build the NUMA topology on machines that have non
  contiguous node ids. This has been already fixed upstream in the
  commits listed below.

  [scope]

  The fix is needed for Xenial, Bionic, Focal and Groovy.

  It's fixed upstream with commits 24d7d85208 and 551fb778f5 which are
  included in v6.8, so both are already in hirsute.

  [test case]

  On a machine like the EPYC 7702P, after setting the NUMA config to
  NPS1 (single node per processor), or just a VM with 128 CPUs, "virsh
  capabilities" does not show the NUMA topology:

  # virsh capabilities | xmllint --xpath '/capabilities/host/topology' -

  <topology>
        <cells num="0">
        </cells>
      </topology>

  When it should show (edited to shorten the description):

  <topology>
        <cells num="1">
          <cell id="0">
            <memory unit="KiB">5027820</memory>
            <pages unit="KiB" size="4">1256955</pages>
            <pages unit="KiB" size="2048">0</pages>
            <distances>
              <sibling id="0" value="10"/>
            </distances>
            <cpus num="128">
              <cpu id="0" socket_id="0" core_id="0" siblings="0"/>
              ....
              <cpu id="127" socket_id="127" core_id="0" siblings="127"/>
            </cpus>
          </cell>
        </cells>
      </topology>

  
  [Where problems could occur]

  Any regression would likely involve a misconstruction of the NUMA
  topology, in particular for machines with non contiguous node ids.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1915811/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list