[Bug 1904730] Re: neutron-agent-sriov fails to create port
Billy Olsen
1904730 at bugs.launchpad.net
Mon Oct 25 22:11:38 UTC 2021
Verified for focal-wallaby using test script in comment #33.
ubuntu at node-lepaute:~$ dpkg -l | grep pyroute2
ii python3-pyroute2 0.5.14-0ubuntu1~cloud0 all Python3 Netlink library
ubuntu at node-lepaute:~$ echo 63 | sudo tee /sys/class/net/enp3s0f0/device/sriov_numvfs
63
ubuntu at node-lepaute:~$ ./test-lp1904730.sh
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 1345, in _ft_decode_generic
self.decode_nlas(offset)
File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 1469, in decode_nlas
(length, base_msg_type) = struct.unpack_from('HH', self.data,
struct.error: unpack_from requires a buffer of at least 19212 bytes for unpacking 4 bytes at offset 19208 (actual buffer size is 16384)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./test-lp1904730.sh", line 7, in <module>
link = ip.link('get', index=link_idx, ext_mask=1)[0]
File "/usr/lib/python3/dist-packages/pyroute2/iproute/linux.py", line 1358, in link
ret = self.nlm_request(msg,
File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 376, in nlm_request
return tuple(self._genlm_request(*argv, **kwarg))
File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 867, in nlm_request
for msg in self.get(msg_seq=msg_seq,
File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 379, in get
return tuple(self._genlm_get(*argv, **kwarg))
File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 704, in get
raise msg['header']['error']
File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 177, in parse
msg.decode()
File "/usr/lib/python3/dist-packages/pyroute2/netlink/rtnl/ifinfmsg/__init__.py", line 1092, in decode
nlmsg.decode(self)
File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 1016, in decode
self._ft_decode(self, offset)
File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 1348, in _ft_decode_generic
raise NetlinkNLADecodeError(e)
pyroute2.netlink.exceptions.NetlinkNLADecodeError: unpack_from requires a buffer of at least 19212 bytes for unpacking 4 bytes at offset 19208 (actual buffer size is 16384)
ubuntu at node-lepaute:~$ sudo add-apt-repository cloud-archive:wallaby-proposed
Ubuntu Cloud Archive for OpenStack Wallaby [proposed]
More info: https://wiki.ubuntu.com/OpenStack/CloudArchive
Press [ENTER] to continue or Ctrl-c to cancel adding it.
Reading package lists...
Building dependency tree...
Reading state information...
ubuntu-cloud-keyring is already the newest version (2020.02.11.4).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Get:1 http://ubuntu-cloud.archive.canonical.com/ubuntu focal-proposed/wallaby InRelease [8771 B]
Hit:2 http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby InRelease
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Get:4 http://ubuntu-cloud.archive.canonical.com/ubuntu focal-proposed/wallaby/main amd64 Packages [155 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Fetched 492 kB in 1s (522 kB/s)
Reading package lists... Done
ubuntu at node-lepaute:~$ sudo apt-get upgrade python3-pyroute2
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
python3-pyroute2
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 269 kB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://ubuntu-cloud.archive.canonical.com/ubuntu focal-proposed/wallaby/main amd64 python3-pyroute2 all 0.5.14-0ubuntu1.1~cloud0 [269 kB]
Fetched 269 kB in 0s (9234 kB/s)
(Reading database ... 71815 files and directories currently installed.)
Preparing to unpack .../python3-pyroute2_0.5.14-0ubuntu1.1~cloud0_all.deb ...
Unpacking python3-pyroute2 (0.5.14-0ubuntu1.1~cloud0) over (0.5.14-0ubuntu1~cloud0) ...
Setting up python3-pyroute2 (0.5.14-0ubuntu1.1~cloud0) ...
ubuntu at node-lepaute:~$
ubuntu at node-lepaute:~$ dpkg -l | grep pyroute2
ii python3-pyroute2 0.5.14-0ubuntu1.1~cloud0 all Python3 Netlink library
ubuntu at node-lepaute:~$ ./test-lp1904730.sh
63
ubuntu at node-lepaute:~$
** Tags added: verification-done-focal-wallaby
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1904730
Title:
neutron-agent-sriov fails to create port
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive queens series:
Triaged
Status in Ubuntu Cloud Archive stein series:
Triaged
Status in Ubuntu Cloud Archive train series:
Triaged
Status in Ubuntu Cloud Archive ussuri series:
Fix Committed
Status in Ubuntu Cloud Archive wallaby series:
Fix Committed
Status in Ubuntu Cloud Archive xena series:
Fix Released
Status in pyroute2 package in Ubuntu:
Fix Released
Status in pyroute2 source package in Bionic:
Fix Released
Status in pyroute2 source package in Focal:
Fix Released
Status in pyroute2 source package in Hirsute:
Fix Released
Status in pyroute2 source package in Impish:
Fix Released
Bug description:
[Impact]
Netlink calls to the kernel can return more than 16k bytes (they can
return 32k on newer kernels). The pyroute2 library has a default
buffer size of 16k and fails to read the data when kernel response
data overflows this.
One example of where users encounter this is booting OpenStack
instances with SRIOV when there are more than 32 VFs, as seen in the
original problem description (included below).
[Test Case]
Use an SRIOV capable card and enable more than 32 VFs on a modern
kernel. Attempt to launch an instance using OpenStack as follows:
1. Create example network:
$ juju switch openstack
$ source ~/deploy/novarc
$ openstack network create \
--provider-physical-network sriovfabric \
--provider-segment 300 \
--provider-network-type vlan \
test-sriov
$ openstack subnet create --network test-sriov \
--no-dhcp \
--gateway none \
--subnet-range 192.168.1.0/24 test-sriov
2. Create ports over virtual function:
$ juju switch openstack
$ source ~/deploy/novarc
$ openstack port create \
--network test-sriov \
--vnic-type direct \
sriov-vf1
$ openstack server create \
--image bionic-kvm \
--flavor m1.small \
--network ext-net-300 \
--port sriov-vf1 \
--key-name ubuntu-keypair \
--availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \
sriov-vf1
3. The instance stalls in build state (virsh list shows paused VM) and
drops to ERROR
[Where problems could occur]
Problems may occur in existing customers already using openstack to
schedule SRIOV instances and may show up as failure to build
instances. Additional problems could include the increased memory
usage of the nova processes which occurs by increasing the default
buffer size. For tightly spec'd systems with small memory allocated to
the host, this could further eat into any margin available and push
memory usage over the edge.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1904730/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list