[Bug 1904730] Re: neutron-agent-sriov fails to create port

Billy Olsen 1904730 at bugs.launchpad.net
Mon Oct 25 22:27:54 UTC 2021


Verified for bionic-ussuri using the test script in comment #33

ubuntu at node-lepaute:~$ dpkg -l | grep pyroute2
ii  python3-pyroute2                     0.5.9-0ubuntu1~cloud0                           all          Python3 Netlink library
ubuntu at node-lepaute:~$ echo 63 | sudo tee /sys/class/net/enp3s0f0/device/sriov_numvfs
63
ubuntu at node-lepaute:~$ ./test-lp1904730.sh 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 1311, in _ft_decode_generic
    self.decode_nlas(offset)
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 1436, in decode_nlas
    offset)
struct.error: unpack_from requires a buffer of at least 4 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./test-lp1904730.sh", line 7, in <module>
    link = ip.link('get', index=link_idx, ext_mask=1)[0]
  File "/usr/lib/python3/dist-packages/pyroute2/iproute/linux.py", line 1332, in link
    msg_flags=msg_flags)
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 373, in nlm_request
    return tuple(self._genlm_request(*argv, **kwarg))
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 864, in nlm_request
    callback=callback):
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 376, in get
    return tuple(self._genlm_get(*argv, **kwarg))
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 701, in get
    raise msg['header']['error']
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 177, in parse
    msg.decode()
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/rtnl/ifinfmsg/__init__.py", line 1087, in decode
    nlmsg.decode(self)
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 982, in decode
    self._ft_decode(self, offset)
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/__init__.py", line 1314, in _ft_decode_generic
    raise NetlinkNLADecodeError(e)
pyroute2.netlink.exceptions.NetlinkNLADecodeError: unpack_from requires a buffer of at least 4 bytes
ubuntu at node-lepaute:~$ sudo add-apt-repository cloud-archive:ussuri-proposed
 Ubuntu Cloud Archive for OpenStack Ussuri [proposed]
 More info: https://wiki.ubuntu.com/OpenStack/CloudArchive
Press [ENTER] to continue or Ctrl-c to cancel adding it.
...
ubuntu at node-lepaute:~$ sudo apt-get upgrade python3-pyroute2
...
ubuntu at node-lepaute:~$ dpkg -l | grep pyroute2
ii  python3-pyroute2                     0.5.9-0ubuntu2~cloud0                           all          Python3 Netlink library
ubuntu at node-lepaute:~$ ./test-lp1904730.sh 
63

** Tags added: verification-done-bionic-ussuri

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1904730

Title:
  neutron-agent-sriov fails to create port

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive queens series:
  Triaged
Status in Ubuntu Cloud Archive stein series:
  Triaged
Status in Ubuntu Cloud Archive train series:
  Triaged
Status in Ubuntu Cloud Archive ussuri series:
  Fix Committed
Status in Ubuntu Cloud Archive wallaby series:
  Fix Committed
Status in Ubuntu Cloud Archive xena series:
  Fix Released
Status in pyroute2 package in Ubuntu:
  Fix Released
Status in pyroute2 source package in Bionic:
  Fix Released
Status in pyroute2 source package in Focal:
  Fix Released
Status in pyroute2 source package in Hirsute:
  Fix Released
Status in pyroute2 source package in Impish:
  Fix Released

Bug description:
  [Impact]

  Netlink calls to the kernel can return more than 16k bytes (they can
  return 32k on newer kernels). The pyroute2 library has a default
  buffer size of 16k and fails to read the data when kernel response
  data overflows this.

  One example of where users encounter this is booting OpenStack
  instances with SRIOV when there are more than 32 VFs, as seen in the
  original problem description (included below).

  [Test Case]

  Use an SRIOV capable card and enable more than 32 VFs on a modern
  kernel. Attempt to launch an instance using OpenStack as follows:

  1. Create example network:
  $ juju switch openstack
  $ source ~/deploy/novarc
  $ openstack network create \
  --provider-physical-network sriovfabric \
  --provider-segment 300 \
  --provider-network-type vlan \
  test-sriov

  $ openstack subnet create --network test-sriov \
    --no-dhcp \
    --gateway none \
    --subnet-range 192.168.1.0/24 test-sriov

  2. Create ports over virtual function:
  $ juju switch openstack
  $ source ~/deploy/novarc
  $ openstack port create \
  --network test-sriov \
  --vnic-type direct \
  sriov-vf1

  $ openstack server create \
  --image bionic-kvm \
  --flavor m1.small \
  --network ext-net-300 \
  --port sriov-vf1 \
  --key-name ubuntu-keypair \
  --availability-zone nova:cmp4az1cz20300kvs.mgt.pst.stg.tlc.example.com \
  sriov-vf1

  3. The instance stalls in build state (virsh list shows paused VM) and
  drops to ERROR

  [Where problems could occur]

  Problems may occur in existing customers already using openstack to
  schedule SRIOV instances and may show up as failure to build
  instances. Additional problems could include the increased memory
  usage of the nova processes which occurs by increasing the default
  buffer size. For tightly spec'd systems with small memory allocated to
  the host, this could further eat into any margin available and push
  memory usage over the edge.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1904730/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list