[Bug 1937261] Re: python3-msgpack package broken due to outdated cython
Christian Rohmann
1937261 at bugs.launchpad.net
Wed Jul 28 13:40:15 UTC 2021
@James Page:
1) We ran your package for a few days and things seem to be smooth now.
Thanks again for picking this up so quickly.
2) May I suggest to keep this an issue with oslo.privsep and have them
add either a (unit) test to ensure every supported platform always has
the native extension for pack/unpack available and to throw a warning at
privsep init. Maybe even going as far as refusing the startup without an
explicit override flag if msgpack native is not available is sensible to
me. The OpenStack project usage of oslo.privset exchanges vast amounts
of data (see https://bugs.launchpad.net/oslo.privsep/+bug/1896734),
using pure python fallback for msgpack is simply not an option in any
real world scenarios. Even without very many interfaces in Neutron, just
switching to debug logs *sic* could break things as well, rendering an
operation to have no proper logs.
And having a condition of intermittent errors or an extreme degradation
is really the worse of issues to have as they are really hard to debug.
Especially since things seems to work in the beginning and one does not
know where to look as absolutely no helpful errors are thrown in the
logs.
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1937261
Title:
python3-msgpack package broken due to outdated cython
Status in Ubuntu Cloud Archive:
Invalid
Status in Ubuntu Cloud Archive ussuri series:
Fix Committed
Status in neutron:
New
Status in oslo.privsep:
New
Bug description:
After a successful upgrade of the control-plance from Train -> Ussuri
on Ubuntu Bionic, we upgraded a first compute / network node and
immediately ran into issues with Neutron:
We noticed that Neutron is extremely slow in setting up and wiring the
network ports, so slow it would never finish and throw all sorts of
errors (RabbitMQ connection timeouts, full sync required, ...)
We were now able to reproduce the error on our Ussuri DEV cloud as
well:
1) First we used strace -ffff -p $PID_OF_NEUTRON_LINUXBRIDGE_AGENT and noticed that the data exchange on the unix socket between the rootwrap-daemon and the main process is really really slow.
One could actually read line by line the read calls to the fd of the socket.
2) We then (after adding lots of log lines and other intensive manual
debugging) used py-spy (https://github.com/benfred/py-spy) via "py-spy
top --pid $PID" on the running neutron-linuxbridge-agent process and
noticed all the CPU time (process was at 100% most of the time) was
spent in msgpack/fallback.py
3) Since the issue was not observed in TRAIN we compared the msgpack
version used and noticed that TRAIN was using version 0.5.6 while
Ussuri upgraded this dependency to 0.6.2.
4) We then downgraded to version 0.5.6 of msgpack (ignoring the actual
dependencies)
--- cut ---
apt policy python3-msgpack
python3-msgpack:
Installed: 0.6.2-1~cloud0
Candidate: 0.6.2-1~cloud0
Version table:
*** 0.6.2-1~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/ussuri/main amd64 Packages
0.5.6-1 500
500 http://de.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
100 /var/lib/dpkg/status
--- cut ---
vs.
--- cut ---
apt policy python3-msgpack
python3-msgpack:
Installed: 0.5.6-1
Candidate: 0.6.2-1~cloud0
Version table:
0.6.2-1~cloud0 500
500 http://ubuntu-cloud.archive.canonical.com/ubuntu bionic-updates/ussuri/main amd64 Packages
*** 0.5.6-1 500
500 http://de.archive.ubuntu.com/ubuntu bionic/main amd64 Packages
100 /var/lib/dpkg/status
--- cut ---
and et voila: The Neutron-Linuxbridge-Agent worked just like before (building one port every few seconds) and all network ports eventually converged to ACTIVE.
I could not yet spot which commit of msgpack changes
(https://github.com/msgpack/msgpack-python/compare/0.5.6...v0.6.2)
might have caused this issue, but I am really certain that this is a
major issue for Ussuri on Ubuntu Bionic.
There are "similar" issues with
* https://bugs.launchpad.net/oslo.privsep/+bug/1844822
* https://bugs.launchpad.net/oslo.privsep/+bug/1896734
both related to msgpack or the size of messages exchanged.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1937261/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list