+1 maintenance report
Christian Ehrhardt
christian.ehrhardt at canonical.com
Fri Nov 13 12:15:29 UTC 2020
Hi,
I tried this week to look at various things to get the archive healthy
and proposed migrations to complete. I won't mention every silly "trigger
test re-run" since there are usually far too many - those consume quite a
lot
of time thou as one also need to prep to recheck results later and go into
detail why a rety still failed and if we need fixes.
The rest might be more interesting for anyone involved (or blocked) by those
packages and also for whoever is on duty next week to know what happened
last
week.
## 1 The Perl is all around you ...
I wonder what it is that whenever I have +1 duty that there is a perl
transition ongoing? This one by the auto-sync of 2.32 from [1] Debian.
Obviously a bunch of combined triggers were needed, but the test queue is
rather full anyway and we can to some extent better wait a few days to
have the things needed all appear in -proposed.
I've seen enough other people to work on this transition (as well as boost)
already. Avoiding to do "just the same" I decided to get more uncommon
things unstuck from the queue.
## 2 usual suspect - i386 dependency fails
The odd bit on php7.4 is that it seems to have an i386 dependency-fail that
it didn't have before. We once had a britney rule like this:
# probably fixable with Multi-Arch: foreign annotation on php-common, but
needs investigation
force-badtest php7.3/all/i386
But in 7.4 things worked - up until recently.
Since I've heard a few times "how to debug this" I have added a section
"Test
for i386 dependency issues" to the i386 wiki page [3].
I'll leave the further handling to the server team, but documenting the
how-to-test seemed to be a "worth for everyone" +1 task to me.
## 3 gdb fails apport test
On
test_add_gdb_info_damaged (__main__.T)
add_gdb_info() with damaged core dump ... warning: Memory read failed for
corefile section, 4096 bytes at 0xffffffffff600000.
WARNING: Please install gdb-multiarch for processing reports from foreign
architectures. Results with "gdb" will be very poor.
WARNING: Please install gdb-multiarch for processing reports from foreign
architectures. Results with "gdb" will be very poor.
FAIL
And on:
test_add_gdb_info_short_core_file
These warning messages were already present with the former version in
tests that
worked fine, so the messages are a red herring. But the Fail is new.
Never the less it seemed reasonable that a new gdb might have problems with
old
test files.
As an interesting side note, the behavior of the tests was odd.
They got scheduled, then ran for ~5h but always showed sub-10 minute
durations.
After another day they seem to be cancelled, but back in the queue as if I'd
have triggered them again (which I didn't).
I got the suspicion that someone had to scrap and restart a bunch of tests,
but
can't prove it :-)
In a local autopkgtest VM the issue was reproducible, but when I got there
I asked around if this is being worked on already and it seems bdmurray is
already on it.
## 4 mumps related builds entangle several transitions
It seems doko has done a great cleanup on rebuilds for soname changes.
In that regard a bunch of packages were built but very interdependent
in excuses. Furthermore a few syncs were incoming from Debian which also tie
into the same related set of packages. Overall I see this is about packages:
scotch, coinor-ipopt, getfem++, petsc, sdpa, trilinos, dolfin, mumps,
syrthes,
superlu, deal.ii, slepc, getdp, petsc4py, slepc4py, sundials
This overall set of packages has various issues:
a) fail to build
a1) dolfin: FTBFS all arch
a2) getfem++: FTFBS arm64
a3) deal.ii: FTFBS on ppc64
a4) deal.ii: build dep on armhf and risc64
a5) slepc: FTFBS on risc64
a6) sundials: FTFBS all arch
b) unsatisfiable dependencies
b1) trilinos: libtrilinos-amesos12/arm64 has unsatisfiable dependency
b2) sdpa: sdpa/sdpam has unsatisfiable dependency
c) uninstallabilities
c1) trilinos: makes libdeal.ii-9.2.0/9.2.0-2/arm64 uninstallable
c2) scotch: makes libtrilinos-ifpack (and others) uninstallable on
arm64&s390x
c3) mumps: makes libtrilinos-amesos uninstallable on s390x
d) test regressions
d1) superlu: i386 autopkgtest regression
d2) petsc4py: arm64&ppc64 autopkgtest regression
Details:
(a1) is a self test fail at build time, fixed by [7] which is
currently building and already has three architectures fine.
(a2) was an odd fail (broken but no build log), restarted build.
resolved on build-retry
(a3) was a now resolved build dependency - fixed.
(a4) never built on those architectures (ok)
(a5) This was a riscv64 issue due to perl not being installable ther.
That should be resolved by now (we are actually already moving to
the next one). So a rebuild now (triggered) or later (once perl 5.32 is
in) will fix this.
Now fixed by a rebuild that I triggered
(a6) farknullmatrix.c:33: multiple definition of `F2C_ARKODE_matrix';
This is https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=957847
which will be fixed in 4.x which is in -experimental
(b1) libtrilinos was built when mumps wasn't ready on arm64 yet.
Due to that amd64 has libmumps-5.3 (>= 5.3.5) [4] but arm64 has
libmumps-5.3.3 (>= 5.3.3) [5]. At least this isn't affected by
bug 973825. A rebuild of trilios should resolve quite a lot of the
things blocked in this set.
I submitted a rebuild to pick this up properly and it worked fine.
(b2) dependencies are just bad, dpeending on non existing packages.
I found that this is a known debian bug [6] of 4 days ago
Once that is fixed and synced we will need rebuilds of sdpa (and more?)
(c1)+(c2)+(c3)
Those all seem to be the same trilinos build that missed the new mumps
version
mentioned in (b1)
(d1) dependency issue on i386, too many deps in flight right not to try to
resolve, but could as well just be an !i386 test override
(d2) had https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=969715 which was
meant to be fixed. The new test fails are different. On debci 3.14
tests
didn't run at all. We seem to need a test-reset or delta to ignore/fix
this again.
Overall most of the things above resolved due to my work, but the overall
set of packages didn't migrate yet. There are too many open issues left like
the gcc-10 bug 957847 in sundials (and more) that need to resolve.
In general these components are in movement in Debian the recent days. So I
touched a few which had issues on "our side". But the overall context needs
to
be revisited later on.
## 5 libftdi1 issues on i386
Issue1:
autopkgtest [16:42:09]: test test-libftdi1
Package libftdi1 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libftdi1.pc'
to the PKG_CONFIG_PATH environment variable
Issue2:
CMake Error at CMakeLists.txt:18 (include_directories):
include_directories given empty-string as include directory.
This affects only the i386 tests, other architectures are good.
This was broken through all of groovy [8] with the same error.
Then it had two good tests in hirsute to now fail again.
Locally reproducible via:
$ sudo ~/work/autopkgtest/autopkgtest/runner/autopkgtest
--no-built-binaries --apt-upgrade --apt-pocket=proposed=src:libconfuse
--setup-commands="dpkg --add-architecture i386; apt-get update"
--shell-fail --architecture i386 libftdi1_1.5-5.dsc -- qemu
--qemu-options='-cpu host' --ram-size=2048 --cpus 2
~/work/autopkgtest-hirsute-amd64.img
Passes in Debian [9] likely a Ubuntu-i386 specific issue.
Comes down to the following failing:
$ pkg-config --cflags libftdi1
-dev is installed
ii libftdi1-dev:i386 1.5-5 i386 Development files for
libftdi1
And the file would be there
libftdi1-dev:i386: /usr/lib/i386-linux-gnu/pkgconfig/libftdi1.pc
Yet in our i386-but-not-really-Environment it fails to work.
I proposed ignoring the test for now [10] but also got hints on
#ubuntu-devel
when I asked for the pattern and wanted to open a bug like [11] for
libftdi1.
I had a little TIL with Cmake and got several things fixed. But eventually
my time-boxing exploded (thanks cmake) and I've given up for now [12].
It feels like it is 95% done, if a CMake+cross-test god could extend on
that, then
thanks in advance. The test override has to do it for now ...
## 6 ruby2.7
Has test issues on i386. Marisa seems to need a bump to an existing test
override [13].
Mecab OTOH seems like a dependency issue on i386 that I could not yet track
down
where to best resolve (or decide to just mask the test for now).
Talked with Lucas and he will give it a closer look.
## 7 libcpupower missing
I've hit this on one of my past +1 duties [14] and it seems others did so as
well. Turned out that the discussion is even older [15]. I marked the bugs
accordingly and made the newer a dup of the older one. But the TODO is on
the kernel team here.
## 8 Netgen
This is an FTFBS for a while and thereby hangs around in proposed being
retriggered every now and then depending on who comes by.
The issue is due to upstream breaking non-x86. After tracking down the
details
I realized that Ubuntu users likely will use the upstream provided PPA
instead anyway.
But on that trip I found the root cause and a proposed fix upstream, so I
have
filed Debian bug [16] to make the maintainer aware as well as [17] on
launchpad
to avoid another +1 member to re-debug this.
## 9 libsmitwatermelon FTFBS
This is a case of the generally odd C++ symbols breaking on dh_makeshlibs
for
Ubuntu s390x/ppc64 builds being somewhat different for no too obvious
reason.
I was filing [18] with a fix that I verified to work but IMHO not being
worth
an Ubuntu Delta upload. If the Debian maintainer accepts it the next
auto-sync
will resolve this.
## 10 boost 1.71 blocked on shapeit4
This is a one-off success, otherwise always failed
https://autopkgtest.ubuntu.com/packages/s/shapeit4/hirsute/s390x
https://autopkgtest.ubuntu.com/packages/s/shapeit4/groovy/s390x
https://autopkgtest.ubuntu.com/packages/s/shapeit4/focal/s390x
We should reset-test this to get things moving, MP for that [20].
## 11 Clustalo FTFBS on s390x
Clustalo 1.2.4-4build1 is happy but 1.2.4-6 added a test that fails.
Erorr:
# Run additional test from python-biopython package to verify that
# this will work as well
src/clustalo -i debian/tests/biopython_testdata/f002 --guidetree-out
temp_test.dnd -o temp_test.aln --outfmt clustal --force
make[1]: *** [debian/rules:36: override_dh_auto_test-arch] Segmentation
fault (core dumped)
In Debian the build works just fine [21] but Ubuntu reproducibly fails [22]
This is reproducible in a s390x LXD container in a built tree (hirsute).
Debugging showed that it already uses -O0 for mipsel and with gcc-10.2
s390x needs the same treatment to not segfault.
Analyzed and reported (no tracker, just mail) proposed to extend that -O0 to
s390x as well [23] to be visible in update excuses also a tracker [24].
This could eventually be an issue with s390x gcc-10.2 optimization, so I
have got this bug mirrored to IBM for evaluation.
I verified that a merge of 3.23 would as expected fix it (as well as a
single
patch backport)
## 12 tgt test fails blocking fio sync
This looked odd at first as it only failed tgt at s390x
while there should be no obvious reasons for being different on those
platforms.
Since I like those platforms I was giving those tests a look.
Both come down to (the neither non-arch-specific):
fio: io_u error on file datafile.tmp: No space left on device: write
offset=95158272, buflen=65536
Ok, this most obvious message is a red herring. The disk that is used is
created by the test itself and is 100MB on each arch. And the test is meant
to
run until it runs "out of disk" - therefore in good cases the message is
present as well.
The bad RC from fio is the breaking factor and that being good/bad is
reproducible at least on the tgt test on s39x0/x86.
While the "out of space" is by design there is another error that seems to
be
architecture specific:
verify-phase: you need to specify size=
fio: pid=24570, err=22/file:filesetup.c:1057, func=total_file_size,
error=Invalid argument
Switching back from fio 3.21-1 to 3.16-1 from -release fixes the issue, so
it
seems indeed to be some sort of regression in the new code.
I checked git and after some time an existing issue [25] with a fix [26].
Since I tested this with 3.23 (which works fine) I have proposed that to
Debian [28]. The next auto-sync will then get this one resolved.
To track this in excuses I have opened LP bugs [29][30] tagged with
update-excuse.
A day later after an upstream discussion I had a workaround which lowers the
memory pressure and submitted it to Debian in [37].
## 13 multipath test fails blocking fio sync
This was not fixed by the fix of #12 above - but also this was only
affecting
one architecture and seemed to be non reproducible outside of launchpad
infra-
structure.
It almost seems more like an issue in a different component than fio which
triggered the issues. Lacking a local reproducer I was forced to create PPAs
to debug things on the infrastructure (3.16 vs 3.21 vs 3.23)
The ppc64el issue turned out to be an OOM kill, but one reliably triggered
by
the new version of FIO. I found that the memory consumption of FIO itself
more
than doubled for the given workload and ppc64 just was the arch with the
tightest memory.
A hirsute rebuild of 3.23 from git shows the same issue while 3.16 from git
is
good - so it should again be bisectable.
I found two changes to the statistical data it gathers which caused the
increase and reported it with a lot of detail upstream [27].
For the ubuntu we need to mark these tests as "big_packages" which I
proposed
in [31]
A day later after an upstream discussion I had a workaround which lowers the
memory pressure and submitted it to Debian in [38].
# 14 Perl comes back to me for libvirt/postgresql
After a few days into the perl transition doko pinged me if I could look
into
the remaining build failures as they are close to what I usually work on.
First of all libguestfs is blocked by libsys-virt-perl
The old one still depends on perlapi-5.30.3 but due to the transition
perlapi-5.32.0 is needed.
And libsys-virt-perl in turn is blocked, missing a newer libvirt.
I had libvirt 6.8 already 80% ready before my +1 week and wanted to work on
it
next week again.
The final affected packages are postgresql-12 and postgresql-13, those are
FTBFS and fail in autopkgtests. This will be fixed by the stable uploads
that
are released today.
As expected the issue of postgresql-13 is fixed in 13.1-1 [33] and will be
for
us in [34] once auto-synced later today.
The only problem is that we might not want to upload another 12.x to Ubuntu
21.04 as we want to go to postgresql-13 eventually. Also in Debian after
checking with the maintainer the preferred option seems to be a removal from
-testing [32].
I'm concerned about removing too much of hirsute if we remove postgresql-12
now.
Therefore I'll go ahead of Debian an upload the stable release of v12 today
which will fix the issue for v12 for now (still to be later in the cycle
removed)..
So in addition to waiting for 13.1 to show up via auto-sync I prepared all
other stable updates as well which also cover a bunch of CVEs for the
supported
releases [35] and the build/test fail filed by rbalint [36].
13.1-1 as synced from Debian built fine as expected, so did 12.5 in a PPA
which
I then uploaded. Together with security I'm preparing the same stable
updates
for all active releases but that will be next week (not part of the +1
duty).
Finally plenty of postgresql related tests failed in the past as they need
to
be triggered together to work. I have done that over the weekend (once the
new
builds were in) to get this closer to migrate.
# 15 ebtables breaking tests
While trying to look at libvirt for perl (see above) I have early on
identified
an issue with iptables/ebtables that turned out to be broken not only in
hirsute but also in groovy.
An initial quick check confirmed that it was not my new libvirt version nor
the
hirsute release at all. So this was worth an investigation if we might have
general issue. A discussion with security showed that the issue could match
the
merge of 1.8.5 and/or the whole iptables/ebtables/nftables move.
I was tracking down which component causes this and then filed a bug [39].
Although I have to admint it feels like my ebtables-foo might just be too
weak,
but then it will be TIL moment once explained :-)
While I found the issue on +1 duty after the initial analysis it became
clear that the task isn't qualifying for +1 so I'll continue on it next
week (in the context of libvirt which we want to have for perl anyway).
[1]: https://lists.debian.org/debian-devel-announce/2020/11/msg00001.html
[2]:
https://people.canonical.com/~ubuntu-archive/germinate-output/i386.hirsute/i386+build-depends
[3]: https://wiki.ubuntu.com/i386
[4]:
https://launchpadlibrarian.net/504480967/buildlog_ubuntu-hirsute-amd64.trilinos_12.14.1-5_BUILDING.txt.gz
[5]:
https://launchpadlibrarian.net/504427600/buildlog_ubuntu-hirsute-arm64.trilinos_12.14.1-5_BUILDING.txt.gz
[6]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=973825
[7]:
https://launchpad.net/ubuntu/+source/dolfin/2019.2.0~git20200629.946dbd3-4
[8]: https://autopkgtest.ubuntu.com/packages/libf/libftdi1/groovy/i386
[9]: https://ci.debian.net/packages/libf/libftdi1/testing/i386/
[10]:
https://code.launchpad.net/~paelzer/britney/+git/hints-ubuntu/+merge/393500
[11]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=946577
[12]: https://paste.ubuntu.com/p/6VDFJmSTf6/
[13]:
https://code.launchpad.net/~paelzer/britney/+git/hints-ubuntu/+merge/393501
[14]:
https://bugs.launchpad.net/ubuntu/+source/gkrellm2-cpufreq/+bug/1891336
[15]: https://bugs.launchpad.net/ubuntu/+source/cpufreqd/+bug/1215411o
[16]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974136
[17]: https://bugs.launchpad.net/debian/+source/netgen/+bug/1903719
[18]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974137
[19]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=966014
[20]:
https://code.launchpad.net/~paelzer/britney/+git/hints-ubuntu/+merge/393558
[21]:
https://buildd.debian.org/status/fetch.php?pkg=clustalo&arch=s390x&ver=1.2.4-6&stamp=1589135789&raw=0
[22]:
https://launchpadlibrarian.net/498532550/buildlog_ubuntu-groovy-s390x.clustalo_1.2.4-6_BUILDING.txt.gz
[23]: https://salsa.debian.org/med-team/clustalo/-/merge_requests/1
[24]: https://bugs.launchpad.net/ubuntu/+source/clustalo/+bug/1903817
[25]: https://github.com/axboe/fio/issues/1065
[26]:
https://github.com/axboe/fio/commit/fd56c235caa42870e6dc33d661514375ea95ffc5
[27]: https://github.com/axboe/fio/issues/1123
[28]: https://salsa.debian.org/debian/fio/-/merge_requests/6
[29]: https://bugs.launchpad.net/ubuntu/+source/fio/+bug/1903963
[30]: https://bugs.launchpad.net/ubuntu/+source/fio/+bug/1903962
[31]:
https://code.launchpad.net/~paelzer/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/393641
[32]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=974061
[33]:
https://buildd.debian.org/status/fetch.php?pkg=postgresql-13&arch=amd64&ver=13.1-1&stamp=1605173772&raw=0
[34]: https://launchpad.net/ubuntu/+source/postgresql-13/13.1-1
[35]:
https://bugs.launchpad.net/ubuntu/focal/+source/postgresql-12/+bug/1903978
[36]: https://bugs.launchpad.net/ubuntu/+source/postgresql-12/+bug/1903573
[37]: https://salsa.debian.org/debian/tgt/-/merge_requests/1
[38]:
https://salsa.debian.org/linux-blocks-team/multipath-tools/-/merge_requests/1
[39]: https://bugs.launchpad.net/ubuntu/+source/iptables/+bug/1904192
--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20201113/999a9e43/attachment-0001.html>
More information about the ubuntu-devel
mailing list