Release Process concerns (QA) and suggestions

Sat Sep 1 19:34:20 UTC 2012

Hi Gema,

Thanks for starting this thread off. :)

On Fri, 2012-08-31 at 11:38 +0100, Gema Gomez wrote:
> On 30/08/12 19:53, Stéphane Graber wrote:
> > The release team is in charge of releasing a pre-defined set of 
> > images, for a given list of media at a given date. That's how 
> > things are.
> > 
> > When we unfortunately hit a bug at the last minute, like happened 
> > last week, the release team needs to check how critical it's. If 
> > it's considered as a show-stopper, like was the case here, the
> > only action to take is to fix it as soon as possible, re-test and
> > then release.
> > 
> > If we know it's technically impossible to get it re-tested in
> > time, then we need to release a day later, but that's a very last
> > resort as releasing on a Friday brings its own set of problems.
> 
> Which kind of problems do you face when releasing on Friday? I think
> it'd be good for us to know the consequences as well.

The problem with releasing on Friday is that we don't have as good
coverage available to react to problems if they occur.  This is standard
policy for stable release updates as well as releases, and has become so
as a result lessons learned the hard way.   Exceptions do occur, but
they are very special cases, and contingency/monitoring plans need to be
figured out in advance.  

> 
> > In the case of 12.04.1, we noticed on release day that an image 
> > didn't actually fit on its target media and apparently no tester 
> > bothered to actually burn it to a standard CD...
> 
> You could use du next time right after the image is built to satisfy
> yourself that the size is good, it could be a standard check that you
> guys do. 

Certain mandatory manual tests can only be run if a CD is burned,
specifically the AMD64+MAC based systems don't work with USB.   

> We have added some static validation tests to jenkins and are
> in the process of publishing them. 

The information was already published on
http://cdimage.ubuntu.com/precise/daily-live/current/
(and related pages).  Its standard practice that an oversize indicator
in bold red is published on the page, when an image is over a
predetermined size as specified by the development teams.

It was a failure of both the QA and release teams that no one looked at
the page before Thursday.  Release team had been looking at them pretty
heavily the prior week, and thought we had all the issues solved.  Based
on discussions with Stéphane there are now plans to be adding an
indicator to the ISO tracker to make oversize issues more visible in the
future, as that is where some folks are now focusing, rather than the
original publishing pages.  

> I don't think we need to burn a CD
> to know if the image is going to fit or not. But if you want us to
> validate things manually, adding a test case to the current set in the
> iso tracker will help track that someone has bothered. Unfortunately I
> don't feel confident enough yet with the admin mode of the iso tracker
> to change anything, so your expertise there would be appreciated.

There is the implication that a CD is burned in some of the test cases
already, so I'm not sure that another test case need to be added, but
rather an existing one be split to make it explicit a CD or when
appropriate a DVD be burned as part of the test. 

If you have specific questions on admin'ing the iso tracker,  please
feel free to join us in #ubuntu-iso-tracker.   There are multiple folk
available (me, Jean-Baptiste, Nick), that can help as well.

> Anyway, looking forward rather than backwards, for Quantal the size is
> 800MB so, what media do you suggest we test on for size next week?
> 
> https://blueprints.launchpad.net/ubuntu/+spec/desktop-q-one-iso-for-q
> 
For Desktop, please test on both USB "and" DVD.  For Ubuntu Server,
please test on both USB "and" CD.  We want to make sure both paths work
since they are likely to be common based on what hardware folks have
access to, and we'll be manufacturing CD's for Server, and DVD's for
desktop, so making sure there are no significant problems is important.

> > We found an obvious way of fixing it (removing a langpack) within 
> > just a couple of hours, got the change reviewed, tested, the image 
> > rebuilt, the content checked and then fully re-tested by 3 testers 
> > in less than 3 hours. Leaving us a good 10-12h before we actually 
> > released the set.
> 
> In my opinion, it is not possible for 3 people to do 10 installs + 3
> upgrades each to a good level of details in less than 3 hours. Yes,
> you can rush through things or split the test cases between the three
> of you, or consider some tests done because one test case is sort of a
> subset of another, and do some risk based testing, but the level of
> risk we are accepting is not clear nor understood by all the parties.

Its a case of testing smart, which we should all be aiming for. I had
good confidence after Stéphane, Jean-Baptiste, and Nick had exercised
the tests, after discussing their methodology with them.  We understood
the scope of the changes and possible impact.  Possibly we should look
at getting their best practices understood wider though, so we can get
the newer QA team members more efficient?  

> > So we had more peer review than required at every step, I'm really 
> > not understanding what you're complaining about.
> 
> See above for explanation. We clearly have different views on what is
> required/acceptable, and we need to reach an agreement, something that
> works for all of us.

The QA contact should always feel free to ask questions in
#ubuntu-release if there are concerns as we're discussing issues that
might motivate a respin.  

> > We always respin responsibly, believe it or not, respinning is at 
> > least as much pain for the release team as it's for the testers,
> > we never take such action lightly.
> 
> I believe it, I'd like to reduce the pain for everyone.

We all want that.

> > Critical installation bugs, security bugs, immediate post-install 
> > bugs and CD size problems are usually considered show-stoppers as 
> > these can't be worked around by the user. It'd be wrong not to 
> > respin for these.
> 
> This is good information. What do you mean by usually? do you mean
> always? What would be an example case of those showstoppers that
> doesn't grant a respin?

We discussed the fact that the Chinese 12.04.1 images were oversize with
PES management resposible for those images.  They decided that it was
better for that market to release them oversize, than to have to live
with the bugs that would be present if we didn't.

> > The "corner case" in [4] is a supported upgrade path, used by 
> > governments and other internet-less environments. Not fixing that 
> > bug was resulting in completely broken, unbootable systems, and as 
> > such definitely fits respin criteria. Our alternative would have 
> > been to drop support for these, which we considered and decided
> > not to for 12.04. However 12.04 is going to be the last release
> > where such an upgrade path is supported.
> 
> Useful information, thanks, we will add this test case earlier in the
> testing next time, so that we don't find ourselves in that situation
> again. Does this only apply to LTSs or does it also apply to
> intermediate releases? Are such customers be likely to upgrade to
> releases in between LTSs?

Internet-less environments are a reality in a large number of
corporations due to firewalls, etc. as well as those without reliable
connection.  The extent that this is a priority for a specific release
is determined by the development teams and marketing.

> > I suppose we can do that, ultimately it's always going to be up to 
> > the release team to do a go/no-go on case by case basis, but 
> > writting some generic guidelines can't hurt.
> > 
> > At least for me, anything that fits one of the following is release
> > critical: - Security issues affecting the live/install environment
> > - Kernel bugs preventing the boot of the image for commonly
> > available hardware - Installer bugs leading to installation failure
> > or broken post-install experience without obvious workaround -
> > Upgrade bugs leading to broken/non-working systems that can't be
> > fixed post-upgrade through SRU. - Critical bugs affecting common
> > software used immediately post-installation
> 
> This is a good starting point, and will help us focus our testing
> going forward. Any other show-stopper kind of issue you can think of?
> Or someone else?

I've started off a page at:
https://wiki.ubuntu.com/ReleaseManagement/ImageRespinCriteria

If others spot things missing, please feel free to add.

> 
> >> - Let's improve the static analysis of images so that we don't 
> >> have the image size problem again, we are adding a job for this 
> >> to Jenkins this week.
> > 
> > Can you also make sure someone actually burns the image on the 
> > supported media?
> 
> As I asked before, what media is the preferred one for the 800MB
> Quantal images? We'll be happy to procure those and make sure we burn
> it. I'd like to be able to track this has been done with a test case
> on the tracker.

see answer above,  it depends on the product. 
> 
> The static validation is added to jenkins and results will start to be
> published to the public instance today.
> 
> > I'm still amazed that for a whole week, nobody even tried to burn
> > a CD with our image...
> 
> I am amazed that you expect things to happen without actually having
> this documented anywhere, we don't have a test case for this. 

I think there has been some test case drift in certain areas, and they
could benefit from a scrub to make the media a bit more explicit.

For instance: 
http://testcases.qa.ubuntu.com/Install/ServerWhole
clearly references the expectation of it being a CD.

However, 
http://testcases.qa.ubuntu.com/Install/DesktopWhole
permits an "or"  ie.  USB or CD.

These should probably become separate tests, as both cases are important
to check are working.  And possibly a case be added for DVD,  then when
planning a release, and we know what image size development is targeting
for release, the appropriate test could be indicated as mandatory.

> This is
> a problem for us, because the team is growing and not everyone has the
> years of experience that jibel has, nor has seen things fail in so
> many different ways as to use intuition. Jibel is trying to move on to
> do upstream testing, so it is our responsibility as a team to be able
> to do a good job and we need transparency from the release team to
> achieve that.

Transparency on the QA team and effective mechanisms for transferring
institutional knowledge from experienced testers to the new ones is also
important.  If a new tester has questions, I'd expect them to ask the
experienced testers first, and then seek clarification from the
development or release team, as appropriate.  The friday meeting is a
good forum and ubuntu-release mail list are good forums for asking for
clarification on ambiguities.  We're all working to the same goal here,
getting the image customers to not be unpleasantly surprised.

> 
> As I said in a different email, jibel won't be the QA single point of
> contact anymore, plars will be leading the milestone testing efforts
> for the last milestones in Quantal with psivaa and babyface's help,
> and then we will work out a schedule or a plan for R that we will
> communicate in due time, so that we are all clear what is going
> to be tested and can tell us if anything is missing.

If there is no longer going to be a single point of contact, its going
to be important to track this as well.  I've added a column to:
https://wiki.ubuntu.com/QuantalQuetzal/ReleaseTaskSignup
for the QA contact.  Please update to the plans for the Beta 2 and
Release as they are known.  So others can know to the "goto" QA person
to consult if they see concerns during their testing.

> >> - Let's require more than just one run of the test cases to 
> >> validate an image. What is reasonable in terms of ensuring 
> >> reasonable HW coverage? I'd like to see at least 3 x 100% run 
> >> rate with 100% pass rate on the current test cases, from people 
> >> different from the release engineer.
> > 
> > For final images I usually look for at least 2 people testing the 
> > various code paths. Unfortunately these code paths can't be easily
> >  represented in the UI, so ultimately it's a release team decision 
> > to know whether the threshold has been met.
> 
> I'd like someone from QA at least involved in the decision process,
> even if only as an assessor, to voice our concerns.

We have had someone from QA involved in the past and plan to continue to
do so in the future. Jean-Baptiste was the contact point for QA for the
last several releases, and his input was involved in our decision
process, so I'm a bit confused why you imply that QA has not been
involved.

> > Sorry for the rather long e-mail, but I hope it's explaining a bit 
> > more how things work.
> 
> It is very helpful, thanks. This email contains information we can use
> to change things preemptively rather than reactively, like we've done
> in the past.

We do make changes preemptively, based on discussion, so I'm not sure I
agree with this judgment from you.  This is the main purpose of our
feedback sessions at UDS, and the release team is pretty good about
adding to our process pages when new changes occur during the cycle that
impact our processes.  see the pages under:
https://wiki.ubuntu.com/ReleaseManagement/#Release_Processes

Is there a similar set for the QA team, that the Release and Development
teams can consult?

Its certainly healthy to question our institutional and historical assumptions 
and strive to improve.   Reading the documentation that folks have taken the time to
write, based on historical lessons learned, and asking for clarification
when something is unclear or suggesting specific improvements is welcome, as is
improving transparency.

Thanks again for starting this thread off.  :)

Kate