+1 Maintenance Report

Mon Nov 25 16:26:49 UTC 2024

Hello,

I did a +1 maintenance shift during the past week. It wasn't planned in
advance due to me being sick for most of the engineering sprint so I actually
started Monday, almost end of day.

This report is too long so here's a ToC: you can search for the titles to find
the sections.

# Re-triggering tests
# Autopkgtest infrastructure
# Openmpi transition
# Coq
# Rust
# Packages now 64-bit only
# Packages without issues but not transitioning
# Smaller items
# Misc

For rust, I'm wondering if it's worth working on the packages unless there's a
strong specific need (you can read the Rust section for details :) ).

The main reason this report is so long is that I want so share what I've
learnt during the week and that easily involves details about britney, apt
(you should use apt patterns, especially ~e), or some package ecosystems.

# Re-triggering tests

I started by re-triggering a number of tests to get a clearer view of
proposed-migrations. I ended up re-triggering a thousand tests during the past
week, manually.

I'm still using my improved excuses page at https://ubuntu.dcln.fr/update_excuses.html
The tabular display brings a much clearer view. There's also a handy feature I
added during my week-end roughly a week ago: log analysis to tell if failures
are testbed- or badpkg-related. With these two things combined, it's trivial
to re-trigger dozens of relevant tests in a few seconds from the UI.

Moreover, the autopkgtest user page (
https://autopkgtest.ubuntu.com/user/adrien/ ) was very useful to follow up on
retries. I've kept it open all week long. Basically, if you are facing
infrastructure troubles, flaky tests or bad luck, it enables you to monitor
all your past retries and aggressively re-trigger.

# Autopkgtest infrastructure

I discussed with the Release Management team about disabling amd64/i386
runners in the lcy02 datacenter since there are issues there and that causes
failed and very delayed tests. The problem is known but out of control of the
team. Since there is no need for throughput at the moment, we agreed on
lowering the number of runners in that datacenter.

It looks like the bos03 datacenter could lead to some testbed issues too but
the occurrence rate seems low (and too low to be able to report it properly).

There has also been issues on ppc64el which are ongoing and result in a large
number of testbed failures on that architecture.

Remember to also read the discourse post on autopkgtest status:
https://discourse.ubuntu.com/t/autopkgtest-service/34490

# openmpi transition

Openmpi has been blocking a large number of migrations (not surprising
considering how many packages depend on it).
I found out that C++ support has been dropped from the standard and now from
the library. This isn't reflected in the packaging and packages end up failing
to find "libmpi_cxx.so.40". This was the most common failure but not the only
one.
Because the uploads in Debian were very recent, that the package was still
being worked on in experimental, and that doko was following, I decided to let
that settle down.

# Coq

I've often seen packages from the coq ecosystem in excuses. The ecosystem has
at least two specificities: lots of puns in the package names (many are in
French), and strict dependencies determined at build times.

Consider package A and package B. Package A will build first and insert a
"Provides: A-ux03" in its packaging. When package B builds, it will "Depends:
A-ux03" rather than "Depends: A".
When A and B enter proposed, the testbed for B will not be set up with the A
from proposed and the test will fail.

The solution is to retrigger the failing tests with an additional trigger to
use A from proposed.

However, Launchpad synchronises from Debian every six hours and will begin
building everything immediately. This gives two rather large windows of
opportunity to change the build order compared to the original uploads in
Debian and possibly require no-change rebuilds in addition to crafted
test triggers.

The transition tracker includes a permanent tracker for a number of
ecosystems, including Coq. It can guide no-change rebuilds (which need to be
done in the order shown by the transition tracker).

Autopkgtest will try to install the test deps with "all proposed" but somehow
that doesn't seem effective. Not sure why because I thought that would be
enough but in practice it's additional triggers which make the tests succeed.

I've also learnt that when britney migrates a package, it can trigger new
tests with the package(s) that just migrated. That would be the source of the
tests with several triggers and no associated requester.
Thinking about it again, I'm wondering why it would extend the triggers if it
migrated the package so that may not be entirely correct.
It's also not completely effective as Bryce explained last week that he got
packages to migrate by retrying them after the packages for some tests saw new
uploads (and I repeated the process).

# Rust

Rust does things slightly differently than Coq but the result is the same.
Very often, tests need to use rust packages from proposed. They don't need
all-proposed but instead something that would be "all-rust-proposed".

There is no permanent tracker for Rust however; could a change in the
transition tracker be beneficial?

I've tried to get rust clusters out of proposed and mostly failed. Because of
the strong dependencies between packages, looking recursively at Depends:
isn't enough and you need to look at the reverse depends for migrations too.
This ties many many packages together, all the while there are ongoing uploads
and synchronizations. Very often, by the time britney would trigger a test,
have it failed, report it, me noticing it, retrying with the proper triggers,
the test succeeding, britney running again, there was another related rust
upload.

Is it even worth trying to make rust packages migrate when not in an import
freeze and it's not Christmas' eve? Did I miss something?
Of course, sometimes rust packages block non-rust packages and that is
obviously a stronger reason to get them to migrate, but what about the other
times?

BTW, Coq is similar but integrates less with non-coq libraries and seems to
cause fewer difficulties.

# Packages now 64-bit only

A number of packages in Debian now include a "Build-Depends:
architecture-is-64-bit" which does exactly what you'd expect.

When binaries for these packages are present on i386 and armhf, britney expect
new uploads to also create binaries on these architectures. However these
packages will not build there anymore.
The solution is to ask an Archive Admin to remove binaries on the affected
architectures. After this, britney will notice there is no binaries there and
will accept to not wait for new binaries from the new upload.

I created several bugs related to this new dependency:
- mayavi2:
  https://bugs.launchpad.net/ubuntu/+source/mayavi2/+bug/2089286
- adios2, fenics-dolfinx, fenicsx-performance-tests, ngspets:
  https://bugs.launchpad.net/ubuntu/+source/ngspetsc/+bug/2089285
- I completed Jeremy's list of postgresql packages to remove on 32-bit arches:
  pgpointcloud, pgnodemx, pgmemcache, pglogical, pgfincore, pgextwlist,
  mobilitydb
  https://bugs.launchpad.net/ubuntu/+source/mobilitydb/+bug/2089236

I also noticed "purify" which is a re-introduced package and currently FTBFS
on other architectures anyway so I didn't create a removal bug report at the
moment.

I had also seen pmix, rocr-runtime, r-bioc-alabaster.base but there were
already no armhf binaries as they had already been removed.
You can use rmadison to check the presence or absence of binaries but I've
come to really like apt + chdist for that, for instance:

    chdist apt plucky-armhf list '~erocr-runtime$'

This uses apt-patterns(7) and '~eREGEX' in particular to select versions where
the source package name matches the specified regular expression (I'm quoting
the manpage).

Using apt-patterns was also very useful for reverse dependencies since
reverse-depends doesn't allow referring to -proposed (or PPAs) as far as I can
tell.

# Packages without issues but not transitioning

There are a number of packages that are not moving, yet there is no reason for
them to be stuck according to update_excuses.html. There is some more data in
update_output.txt but it's not directly usable: at best it will give hints.

I found that apt can help (and didn't yet manage to use dose in the proper
way). Rought steps are:
- open update_output.txt,
- pick the sources packages from a "trying" line,
- also pick the binaries in the following line,
- use chdist to apt list binary packages in -proposed (and only -proposed) for
  the source package,
- use chdist to apt list binary packages _not_ in -proposed for the binaries
  listed by britney
- use chdist in a full environment to apt install both set of binaries
  at once (the ones reported by britney and the ones reported by apt using
  only -proposed)

It should fail but with a better error message. For instance, at the moment
update_output.txt contains the following:

    trying: libgit2
    skipped: libgit2 (34, 7, 645)
        got: 47+0: a-6:a-4:a-24:i-1:p-4:r-4:s-4
        * armhf: librust-bat-dev, librust-cargo-dev, librust-debcargo-dev, librust-erbium-core-dev, librust-eza-dev, librust-git-absorb-dev, librust-git2+default-dev, librust-git2+https-dev, librust-git2+openssl-probe-dev, librust-git2+openssl-sys-dev, librust-git2+ssh-dev, librust-git2+ssh-key-from-memory-dev, librust-git2-curl-dev, librust-git2-dev, librust-gping-dev, librust-libgit2-sys-dev, librust-ripasso-dev, librust-shadow-rs-dev, librust-vergen-dev

With apt, we obtain this error:

    librust-libgit2-sys-dev : Depends: libgit2-dev (< 1.8~~) but 1.8.4+ds-1ubuntu1 is to be installed

With apt --solver 3.0, we obtain a different message but the same error:

    E: Conflict:  -> libgit2-dev:amd64=1.8.4+ds-1ubuntu1 but librust-gping-dev:amd64=1.17.3-1 -> librust-shadow-rs-dev:amd64=0.29.0-1 -> librust-git2-dev:amd64=0.18.2-1 -> librust-libgit2-sys-dev:amd64=0.16.2-1 -> libgit2-dev:amd64=1.7.2+ds-1ubuntu3 -> not libgit2-dev:amd64=1.8.4+ds-1ubuntu1

We can take that further and repeat the above by pulling '~erust-libgit2-sys'
(again, apt-patterns) from proposed, or we can directly check the dependencies
of librust-libgit2-sys-dev.
Either way, we see that librust-libgit2-sys-dev depends on
libgit2-dev:amd64=1.7.2+ds-1ubuntu3 while proposed contains
libgit2-dev:amd64=1.8.4+ds-1ubuntu1. This matches the fact that
rust-libgit2-sys has been in proposed for 24 days while the new libgit2 1.8.x
arrived in proposed only 10 days ago.

I've mostly scripted the above (it's actually very short) but my script can
still be confusing to others so I'll finish that in the coming days and post an
update when it's ready.

# Smaller items

## golang-github-protonmail-go-crypto

I created an MR to include golang-github-protonmail-go-crypto in big_packages
on all arches:
https://code.launchpad.net/~adrien/autopkgtest-cloud/+git/autopkgtest-package-configs/+merge/476785

## Oracular

Did an unplanned detour on oracular and re-triggered tests failing there.

## Snapd

I re-triggered tests for a snapd SRU across several releases. The backers of
the SRU were asking for people to retry the failing tests through a launchpad
bug. Considering that the tests required several retries, that felt very
inefficient and long for them. I guess IRC and providing retry links would
have been more efficient but it still requires involving several people which
will lead to latency and frustration. Except by broadening retry rights, I'm
not sure how to really fix that however.

# Misc

I'm sharing below the list of packages for which I did test retries and that
migrated or have all tests passing thanks to that.
I know several people are doing these too and had an effect too but I think it
was almost only me for the ones below but it's also difficult to quantify
completely accurately.
Some of these packages had been in -proposed for a pretty long time (several
weeks). I'm not sure why the large-scale test retries had not been enough.

7zip, bash, blinker, cockpit, curl, dask-distributed (I then re-triggered its
test for python-lz4, which migrated), db5.3, diffstat, dipy, dipy,
django-axes, dnsmasq, dvisvgm, eccodes, genshi, glfw,
golang-github-ccoveille-go-safecast, golang-github-protonmail-go-crypto,
golang-github-protonmail-gopenpgp (unblocked by *-go-crypto),
golang-github-protonmail-gopenpgp-v3 (unblocked by *-go-crypto),
gtksourceview5, golang-github-go-macaroon-bakery-macaroon-bakery (triggered
migrated-reference/0 which failed and the package migrated), intel-microcode,
jupyter-client, kdialog, kf6-kio, khelpcenter, libdigest-hmac-perl,
libfilter-perl, libfilter-perl, libmodule-corelist-perl,
libmodule-scandeps-perl, libnet-dns-perl, libxml-sax-perl, linux-base,
lua-moses, microsoft-authentication-library-for-python, multiprocess,
needrestart, oxigraph (I was surprised I finally got a rust package to
migrate!), patroni, phosh, php-fig-link-util, php-league-commonmark,
php-league-flysystem, php-mikey179-vfsstream, php-nyholm-psr7, php-phpseclib,
php-phpseclib3, php8.3, phpab, phpunit, prettytable, pytest-httpx,
python-argcomplete, python-biom-format, python-lupa, python-lz4, python-pip,
python-pomegranate, python-skbio, python-tinycss2, python-trio,
python-watchfiles, pyzmq, qemu, qstylizer, quart, rust-thiserror,
rust-thiserror-impl, rust-wide, shotwell, snappy-java, sphinx-rtd-theme,
starjava-votable, sysprof, tgt, urwid, valgrind-if-available, vim,
wxpython4.0, wxwidgets3.2, xonsh, yakuake (re-triggered for s390x where a test
related to github authentication was probably flaky), ydiff, zookeeper

I was able get the list of tests I triggered thanks to the autopkgtest user
page and a small bit of javascript to use in the browser's javascript console.
After that it was easy to check the corresponding migrations twice a day or
so. Code below:

    s='';

    document
    .querySelector("#results-complete ~ table")
    .querySelectorAll("tr")
    .forEach(function(node, index) {
      if (index === 0)
        return;

      tds = node.querySelectorAll("td");
      package = tds[0].textContent;
      arch = tds[2].textContent;
      version = tds[3].textContent;
      triggers = tds[4].textContent;
      date = tds[6].textContent;
      result = tds[9].textContent;

      s+=`${date} ${package} ${arch} ${triggers} ${result}`
    });

    console.log(s);

# Thanks

It was a long e-mail, I can spend a few words for to thank people who assisted
and/or bore with me. This time this involved Graham, Jeremy, Matthias, Paride,
Simon Chopin and Quigley, and certainly others. :) 

-- 
Adrien