adsys SRU
Didier Roche
didier.roche at canonical.com
Wed Jun 14 07:31:42 UTC 2023
Hey Chris, let me chime in.
Le 14/06/2023 à 08:26, Christopher James Halse Rogers a écrit :
> There's an Jammy/Lunar adsys SRU¹ in the queue at the moment, and I
> think it needs bringing to up to the list for discussion.
>
> The changelog looks like approximately 9 months of normal feature
> development. The diff against Jammy is >3MB in size (due largely to
> significant vendored-dependency churn it seems). The relevant part of
> SRU policy - “Other safe cases”² - allowing feature addition, says “If
> existing software needs to be modified to make use of the new feature,
> it must be demonstrated that these changes are unintrusive, have a
> minimal regression potential, and have been tested properly”. It looks
> like adsys is well tested, but I'm not sure about these being minimal
> changes or with minimal regression potential ☺.
>
> It's true that we've done a wholesale backport of adsys 0.9.2³ to
> Jammy in the past; however, in that case the changes were mostly
> listed bugfixes or FTBFS fixes, and the feature addition was shipping
> a *Windows* binary.
>
> I'm writing this to ubuntu-release@ for two main reasons:
>
> 1. It seems valuable to include adsys updates in LTS releases;
> however, I'm not sure that the scope of changes (and seeming
> criticality of the system - “failures might prevent users from logging
> in” seems pretty bad) falls under the existing delegation of power
> from the Tech Board to the SRU team.
Unfortunately, like many projects, there is a constant tension between
the request for new features backport (adsys, as being an enterprise
product, only really makes sense in a LTS context) and bug fixes. Most
of the new features are developed due to industry requirements, which are:
- evolution of their own security practices (for instance, certificates
support)
- request for other platform supports (winbind in addition to
already-existing sssd)
Due to our very limited team capacity, already max-ed out and being
split between many projects on different themes, our only way to have a
good adsys support, while answering the two previous requirements is to
support only one single code base version, meaning, shipping the same
code base in all supported releases. As most of the dependencies are
vendored (apart from some limited dynamic C linking or dep on samba/sssd
for instance), we are in control of what we ship and know exactly what’s
our quality base is on it (more details on that in the next paragraphs).
> 2. There's a *lot* of vendored code churn, and from the SRU
> perspective I have no information as to whether that's appropriate. I
> understand that the Go ecosystem does not follow our ideas of stable
> releases and there's a real tension here - it's a huge amount of work
> to vet dependency updates, and such updates are *likely* to include
> bug fixes. I don't think “we just update all our vendored dependencies
> each SRU to whatever upstream is most recently shipping” is an
> appropriate standard, though. I'm not sure what *is* the right
> balance, though.
Right, but also, you need to take into consideration the following:
- as we are vendoring dependencies, accepted as part of the MIR process,
it means that we, as upstream, takes the responsibility in front of the
security team to handle security fixes inside those dependencies. Most
of the security fixes in the various dependencies comes only with new
upstream "release" (even if in the Go ecosystem, this is mostly a tag).
FYI, the Rust ecosystem is following the same pattern and the vendoring
exception is allowed for it too.
- as we took that responsability of vendoring, and updating them, it
means that we need to do that work as part of the SRU process too.
- however, due to the very, very, limited team capacity mentioned above,
we need to pick our battle and supporting a "single code base"
(including vendored dependencies) is the only way we can go.
So, with that amount of diff, how do we ensure we can ship something we
trust and that we are not impacted by any kind of regressions?
1. This can only be done by automated tests.
As of today, I count 1557 automated tests on the adsys repository alone.
Those are unit/package/integration tests, using golden files to project
exactly the desired expected for each tests on the file system:
https://github.com/ubuntu/adsys/tree/main/cmd/adsysd/integration_tests/testdata/TestPolicyUpdate/golden/current_user%2C_first_time.
All those are run against the exact same versions of vendored
dependencies and Go version that is going to be built against in the
distro on our CI, even when we automatically update one of the vendored
dependency: https://github.com/ubuntu/adsys/actions/runs/5257398861
We run those tests with **and** without built-in Go race detector. Also,
we are testing untrusted inputs (like the Windows Active Directory GPO
utf16 little-endian input) with fuzz testing, and we already fixed some
crashes with it, like https://github.com/ubuntu/adsys/pull/333.
2. All the changes are reviewed by a peer (or developed with pair
programming sessions), which ensure that everything that entered is
carefully tested and review.
The only gap I can identify right now are on the end to end tests:
- Maybe the Windows AD controller changes and this has an impact on us
(on this one, quite unlikely as Active Directory is decades old and
doesn’t seem to have major changes anymore).
- Samba/sssd/kerberos can change from one version of Ubuntu to another
and impacts us, as we are reusing part of their outcome as fixtures.
We are covering this with - unfortunately - manual end to end testing
for every SRU or upload to the current development version. We are
aiming (and have a Jira Epic we drafted this cycle) to start having that
automated. It’s a complex environment because we need some Windows
servers alongside our Ubuntu machines, those end user tests needs to
reboot our machines multiple times, change some configuration on the
Windows side to reflect on the Ubuntu one and so on.
This is why we covered that part with manual testing as a stop gap
solution, which is to ensure that 3rd party, non vendorizable,
components of the systems, are still functioning correctly. However, it
doesn't protect the opposite: an upload of samba breaking us, which
happened in the development version for instance where a 10 years old
vendorized heimdal samba code was updated in one shot in lunar dev
release. Good luck to find the regression between thousands of commits!
We have lost hours on this. So updating vendored dependencies as fast as
possible helps reducing this issue IMHO as we do in adsys rather than
increasing as in the samba case. This is why we need to have our
automated end to end tests to ship with even more confidence and less
manual intervention, but this requires also networking between multiple
OS and machines, and we need autopkgtests enhancements for this.
I think that should shed some lights on how we ensure a high quality
level. This project is shipped and used in different enterprise
environments, and I can say that if you compare the volume of usage
having big names, compared to the amount of bug reported (most of them
are either feature requests or gardening work opened by us to keep our
code base modern: https://github.com/ubuntu/adsys/issues and
https://bugs.launchpad.net/ubuntu/+source/adsys), even after major SRUs
like the one you mentioned, we don’t have to do emergency fixes. This is
giving us trust and confidence that our coding practices and processes,
are supporting us in delivering high quality software despite all the
constraints I mentioned above.
As a more general topic, I don’t think the SRU team (as the MIR team) is
in position in terms of time (not being a full-time team) or even
knowledge, to really understand every diff entering the distribution
itself. (I have the same opinion when we enter the distro freeze and the
release team review each diff). So, I see those teams roles more about
assessing impact/risk of a change and how much trust there is in
upstream to be proactive in term of quality or reactive in term of any
issue that arose.
> So, in summary: I have two questions - does this exceed SRU authority,
> and need Tech Board approval, and what level of justification is there
> for wide ranging vendored code updates in the SRU?.
I think one way forward is for adsys to file up the Special documented
cases with all the information above and enter the list where we trust
and ensure that upstream is accountable for the SRU?
https://wiki.ubuntu.com/StableReleaseUpdates#Documentation_for_Special_Cases
Thanks for considering it,
Didier
More information about the Ubuntu-release
mailing list