Key Ubuntu teams should have an open process for new members
Steve Langasek
steve.langasek at ubuntu.com
Mon Dec 18 07:53:59 UTC 2023
On Fri, Dec 15, 2023 at 11:37:33AM -0800, Erich Eickmeyer wrote:
> Additionally, the SRU team, Release Team, and Archive Admin team have not
> done any work on what it means to onboard any team members, which is in
> itself a breach of the Code of Conduct:
Erich, you have a pattern of invoking the Code of Conduct against fellow
developers when they disagree with you which is inappropriately escalatory
and does not advance your purpose. Please stop.
The teams in question have been asked to document their onboarding
requirements and process. This is a fair ask, which has not yet been
delivered on publicly for any of the teams in question because it must be
balanced against day-to-day responsibilities.
But you are implying with your message that the lack of DOCUMENTATION for
onboarding onto these teams is the cause of problems with the response to
the recent high-impact SRU regression.
There are many things I think can be improved about how this SRU regression
was handled, and I will go into details below. But it is unrealistic to
argue that having this documentation in place would have changed the
composition of the teams at the time and thereby prevented this incident.
In particular, the perceived problem at the time was lack of availability of
an Archive Admin, and a defining principle of membership in the AA team is
this: there are many competent and trustworthy Ubuntu developers who could
do the job of archive admins; but because of the raw control over the
archive that membership in this team confers, the team should be as large as
it needs to be to fulfill its responsibility to the community of Ubuntu
developers, *and no larger*.
So no, writing this down on a wiki page would not have changed the
composition of the Archive Team prior to this event; nor does the fact that
this event happened imply that expanding the archive team is the correct
remedy.
A timeline of events; all times given in US/Pacific to minimize the
possibility of miscalculations on my side.
2023-12-07 13:25: mutter 45.2-0ubuntu1 SRU accepted into mantic-proposed.
2023-12-13 08:03: bug #2046360 opened, reporting a regression in this SRU.
uploader of SRU subscribed to bug and bug was tagged
regression-proposed.
2023-12-14 04:52: mutter 45.2-0ubuntu1 SRU released into mantic-updates.
2023-12-15 01:22: bug #2046360 re-tagged regression-update.
2023-12-15 05:36: ahasenack asks on a Canonical-internal SRU team chat about
stopping phasing for an update.
2023-12-15 07:40: ahasenack pings ubuntu-archive on #ubuntu-release.
2023-12-15 08:54: tsimonq2 responds to the pings on #ubuntu-release.
2023-12-15 09:33: ahasenack asks on a Canonical internal chat for an archive
admin but does not highlight AAs by name.
2023-12-15 10:04: aaronprisk (Community Team at Canonical) reaches
out to me directly on Canonical internal chat, indicating
he had been contacted by tsimonq2. I do not know if he
reached out to other AAs.
2023-12-15 11:09: I notice Aaron's message and indicate I will address this
with an ETA of an hour (I am out of the office at the time)
2023-12-15 11:37: preceding message is sent to tech board mailing list.
2023-12-15 12:26: I make it to my computer where I'm able to effect the
requested change to SRU phasing.
2023-12-17 19:44: I upload a revert of mutter to mantic-proposed.
2023-12-17 21:41: the revert of mutter is accepted to mantic-proposed by
another SRU team member.
So there are a number of things that didn't work well here in terms of
process.
- The regression in the SRU was reported by Dan and an appropriate tag was
set. However, he did not mark the corresponding SRU bug
verification-failed, which is part of the process for regression handling
documented on <https://wiki.ubuntu.com/StableReleaseUpdates#Verification>.
So a longstanding member of the Ubuntu Desktop team (but not an Ubuntu
developer?) was unfamiliar with the necessary process for blocking an SRU
when a regression is detected. Do we have gaps in how the existing
process has been communicated?
- I subscribe to the regression-update and regression-proposed bug tags, but
we have not set an expectation that all members of the SRU team subscribe
to these tags. Comparing the "May be notified" lists on the side bar of
sample bugs suggested in fact that I was the only member of the SRU team
subscribed to the regression-proposed tag at the time; and only about half
of the SRU team members appear to be subscribed to the regression-update
tag. Should we require SRU team members to be subscribed to both tags, as
an additional guard against accidental mis-release of regressions?
- Even if everyone was subscribed to the regression-proposed tag, there's no
guarantee they've received/seen/read the email before processing the list
of to-be-released packages on
<https://ubuntu-archive-team.ubuntu.com/pending-sru.html>; and even if
they have, they may overlook the connection between such a mail and an
SRU they are about to release. Should this report flag all
regression-proposed bugs open against a package, regardless of series
targeting?
- Was there agreement about the urgency of the need to disable phasing, and
was this urgency communicated? Dan did not set a severity on the bug when
filing it. Exchange between ahasenack and jbicha (uploader) on IRC
yielded a "yes, let's pause phasing", with no apparent expression of
urgency. There was no effort, visible to me, to escalate to any archive
admins individually using available communications channels until
aaronprisk pinged me, over 4 hours after the initial IRC pings. (At the
time the decision was initially made to request a stop for phasing, it was
still well within European business hours, for any EU-based archive admins
who were not already out for the end of the year.)
- We have a standing policy of not releasing SRUs on Friday, unless there's
an exceptional reason to do so and a member of the SRU team commits to
being available on the weekend to handle any regressions. This SRU was
not released on a Friday, it was released on a Thursday; but it was the
Thursday before a company-wide end-of-year shutdown and many folks were
already out on vacation (including myself). Should we have been releasing
SRUs this day without verifying there was appropriate capacity for dealing
with any regressions? Should there have been an explicit conversation
about end-of-year plans for SRU releases among the SRU team? I understand
there was a specific request to release this SRU before the end of year,
but it's not clear that this request should have been honored under the
circumstances.
- The normal process for handling a regression in an SRU is to set phasing
to zero, to minimize the propagation of the bad update to additional
users; AND to immediately begin the process of doing a follow-up SRU to
revert the bad changes so that any users who have already received the bad
update before changes to the phasing are able to get a fix. The first
part of this blocks on availability of an Archive Admin. The second part
of this is entirely within the power of the uploader together with a
member of the SRU team. But a full day later, there had still not yet
been an upload of mutter to mantic-proposed to fix this problem for
affected users. Why is that? Comments from the uploader on IRC:
12:28 <jbicha> vorlon: unfortunately I'm out of time today to do a new
mutter upload. Would we want the new targeted mutter fix
to wait for 7 days too?
12:29 <jbicha> mutter 45.2 fixes important enough issues that I'd rather
go forward than backwards
But "rather go forward than backwards" has resulted in neither happening
for over a day. And "out of time today" came 5 hours after the decision
that phasing should be halted.
Robie commented on IRC that there should be a clearer playbook for
handling regressions. Absent that, however, it should still be clear that
turning off phasing for an SRU only prevents it from being delivered to
MORE users, it does not un-break users who have already received a broken
SRU.
In summary: no Ubuntu core-dev involved in this SRU thought the severity of
the bug was high enough to warrant doing the work of uploading a revert for
over 48 hours after it was known to be a regression in mantic-updates; yet
you are accusing the Archive Team of mismanagement because of a 4-hour delay
in response to a non-urgent request for dealing with the same bug.
--
Steve Langasek Give me a lever long enough and a Free OS
Debian Developer to set it on, and I can move the world.
Ubuntu Developer https://www.debian.org/
slangasek at ubuntu.com vorlon at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-release/attachments/20231217/51f66cee/attachment.sig>
More information about the Ubuntu-release
mailing list