[Bug 954197] Re: base system installation is not robust against transient network failures

Colin Watson cjwatson at canonical.com
Tue Mar 13 16:49:50 UTC 2012


This is a complex problem that has been known in the Debian installer
since at least 2004.  I'm going to try to break it down here in the hope
of making some progress on it.

1. Download error handling in debootstrap is arranged wrongly

  In particular, it doesn't deal correctly with corrupted files, and
will tend to muddle on until something fails as a consequence of the
corruption.  In some cases it's possible for debootstrap to complete
successfully despite a corrupted download!  There's a patch in
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=618920 that improves
things, although I've been working on a better version of it.

2. No retry option

  As Joey notes in http://bugs.debian.org/cgi-
bin/bugreport.cgi?bug=283600, there's only a fairly limited
communication channel between debootstrap (which is a separate tool
invoked by the installer to do the hard work) and the parts of the
installer that can actually interact with the user.  This means that
it's hard to set up a "retry" option that just retries a single
download, because debootstrap doesn't wait for user interaction on
errors and it would be a substantial amount of work to rearrange it to
do so.

  What we might be able to do is as follows: if debootstrap fails at the
retrieval stage before it actually starts unpacking anything, then we
could offer an option that simply tries the whole thing again, keeping
the previous contents of /target (so that would also preserve anything
you'd wgetted by hand, but it would also try to redownload any other
missing files for itself).  This is a little less neat, but would do the
job.  In fact, if we borrowed some ideas from net-retriever, we could
even let you choose a different mirror.

** Bug watch added: Debian Bug tracker #618920
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=618920

** Bug watch added: Debian Bug tracker #283600
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=283600

** Also affects: debootstrap (Debian) via
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=618920
   Importance: Unknown
       Status: Unknown

** Also affects: base-installer (Debian) via
   http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=283600
   Importance: Unknown
       Status: Unknown

** Changed in: base-installer (Ubuntu)
       Status: New => Triaged

** Changed in: base-installer (Ubuntu)
   Importance: Undecided => Medium

** Changed in: debootstrap (Ubuntu)
       Status: New => Triaged

** Changed in: debootstrap (Ubuntu)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to base-installer in Ubuntu.
https://bugs.launchpad.net/bugs/954197

Title:
  base system installation is not robust against transient network
  failures

Status in “base-installer” package in Ubuntu:
  Triaged
Status in “debootstrap” package in Ubuntu:
  Triaged
Status in “base-installer” package in Debian:
  Unknown
Status in “debootstrap” package in Debian:
  Unknown

Bug description:
  [This bug was originally reported by Gary Potwin in
  https://answers.launchpad.net/ubuntu/+source/ubiquity/+question/158404]

  I have a Supermicro P6SBA motherboard with a 700 MHz Pentium III, 512
  M of ram, 20G and 120G hard drives, and a DSL Internet connection.

  This system has been running Windows 98 for years, and I wanted to try
  Ubuntu 10.04.

  Using the network kernel and initrd, the systems boots OK and
  downloads the rest of the installer OK from the default mirror
  (us.archive.ubuntu.com) except for one small problem (see below).

  Everything seems OK until I try to load the base system.

  After downloading files for about 3.5 minutes (often when it is trying
  to get the file libklibc), I see the network activity stop, and soon
  after I get an error message stating that it has failed to load that
  file (all others up to that point were OK).

  After about 2 more minutes, during which time one or more additional
  files fail to load, the network activity goes back to normal, and all
  the remaining files for the base system download OK.

  Due to the failed files, I get the error message that the base system
  has failed to install.

  I did successfully download the failed files using wget into
  /target/var/cache/apt/archives while the automatic download was still
  in progress using a console, but after the above failure.

  The system still thinks that the files were not successfully
  downloaded, and I don't know how to tell the system that they are
  there and OK.  I have tried using many different mirrors at different
  times of the day, and both http and ftp, all fail as above.

  Using a similar technique, I was able to successfully load Debian
  5.08, so I think the hardware is OK, but I would really like to try
  the Ubuntu.

  To try to rule out any problem with the DSL, I downloaded a very large
  file that took 10 minutes of continuous running under Windows.

  Then I went back to the 10.04 install and did the same thing using
  wget at a console, just after partitioning the hard drive, and just
  before starting the base install, and it worked fine.  While loading
  the installer, one file did fail to load, but you were given the
  opportunity to retry, which took care of the problem.

   I wish the base install allowed retries instead of just "go back" and
  "continue", which don't seem to make any additional attempt to retry.

  Any help would be appreciated.

  Gary

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/base-installer/+bug/954197/+subscriptions




More information about the foundations-bugs mailing list