[Bug 1915811] Re: Empty NUMA topology in machines with high number of CPUs
Victor Tapia
1915811 at bugs.launchpad.net
Fri Mar 12 10:07:49 UTC 2021
#VERIFICATION XENIAL
Using the test case described in the description, where a VM has 128
vcpus assigned, the version in -updates does not list the topology:
$ dpkg -l | grep libvirt
ii libvirt-bin 1.3.1-1ubuntu10.30 amd64 programs for the libvirt library
ii libvirt0:amd64 1.3.1-1ubuntu10.30 amd64 library for interfacing with different virtualization systems
$ virsh capabilities | xmllint --xpath '/capabilities/host/topology' -
XPath set is empty
The package in -proposed fixes the issue (output shortened):
$ dpkg -l | grep libvirt
ii libvirt-bin 1.3.1-1ubuntu10.31 amd64 programs for the libvirt library
ii libvirt0:amd64 1.3.1-1ubuntu10.31 amd64 library for interfacing with different virtualization systems
$ virsh capabilities | xmllint --xpath '/capabilities/host/topology' -
<topology>
<cells num="1">
<cell id="0">
<memory unit="KiB">4998464</memory>
<pages unit="KiB" size="4">1249616</pages>
<pages unit="KiB" size="2048">0</pages>
<pages unit="KiB" size="1048576">0</pages>
<distances>
<sibling id="0" value="10"/>
</distances>
<cpus num="128">
<cpu id="0" socket_id="0" core_id="0" siblings="0"/>
...
<cpu id="127" socket_id="127" core_id="0" siblings="127"/>
</cpus>
</cell>
</cells>
</topology>
NOTE: if the machine is running a 4.4 kernel, numa_all_cpus_ptr->size
(used to set max_n_cpus in libvirt) is 512 instead of 128 and the issue
cannot be triggered (libvirt max vcpu is 255). Any newer kernel, such as
HWE, sets the value to 128, triggering the issue.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1915811
Title:
Empty NUMA topology in machines with high number of CPUs
Status in Ubuntu Cloud Archive:
New
Status in Ubuntu Cloud Archive stein series:
Fix Committed
Status in Ubuntu Cloud Archive train series:
Fix Committed
Status in Ubuntu Cloud Archive ussuri series:
Fix Committed
Status in libvirt package in Ubuntu:
Fix Released
Status in libvirt source package in Xenial:
Fix Committed
Status in libvirt source package in Bionic:
Fix Committed
Status in libvirt source package in Focal:
Fix Committed
Status in libvirt source package in Groovy:
Fix Committed
Bug description:
[impact]
libvirt fails to populate its NUMA topology when the machine has a
large number of CPUs assigned to a single node. This happens when the
number of CPUs fills the bitmask (all to one), hitting a workaround
introduced to build the NUMA topology on machines that have non
contiguous node ids. This has been already fixed upstream in the
commits listed below.
[scope]
The fix is needed for Xenial, Bionic, Focal and Groovy.
It's fixed upstream with commits 24d7d85208 and 551fb778f5 which are
included in v6.8, so both are already in hirsute.
[test case]
On a machine like the EPYC 7702P, after setting the NUMA config to
NPS1 (single node per processor), or just a VM with 128 CPUs, "virsh
capabilities" does not show the NUMA topology:
# virsh capabilities | xmllint --xpath '/capabilities/host/topology' -
<topology>
<cells num="0">
</cells>
</topology>
When it should show (edited to shorten the description):
<topology>
<cells num="1">
<cell id="0">
<memory unit="KiB">5027820</memory>
<pages unit="KiB" size="4">1256955</pages>
<pages unit="KiB" size="2048">0</pages>
<distances>
<sibling id="0" value="10"/>
</distances>
<cpus num="128">
<cpu id="0" socket_id="0" core_id="0" siblings="0"/>
....
<cpu id="127" socket_id="127" core_id="0" siblings="127"/>
</cpus>
</cell>
</cells>
</topology>
[Where problems could occur]
Any regression would likely involve a misconstruction of the NUMA
topology, in particular for machines with non contiguous node ids.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1915811/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list