[Bug 571707] Re: fsck progress stalls at boot, plymouthd/mountall eats CPU

Sitsofe Wheeler sitsofe at yahoo.com
Sun May 9 17:13:15 BST 2010


Hi Michael,

As one of the random people who has posted a quite a few of those
technical comments I must apologise. The aim wasn't to make other
people's lives harder...

It took the pretty much an entire day last week of non stop work for me
to get a handle on the problem (I'm not the world's fastest/smartest
programmer) and I was working on a netbook with limited space. I was
doing this on my own time (I have no connection to Canonical) but even
if I were being paid I could not have gone faster. While I was going
along, I was posting notes in the hope it would help others quickly
reproduce what I had done and help people guess at new reasons for the
problem occurring. Arand and others were working on the problem too.

Having spent many hours, I finally came up with an idea I couldn't test
as I lacked the appropriate setup (I was working by reading the code and
thinking about what might help). I posted it here and it was only then I
started searching around at other bugs and could recognise someone else
(Temo!) had independently posted a tested solution to the problem much
earlier so I posted yet another comment making a link. Arand appears to
have quickly turned this into a package people could test and from there
things seem to have moved a lot faster.

Part of the problem with bugs like this is that they take time to
diagnose. As a programmer I can tell you some of the most difficult
problems to fix are the ones you can't reproduce on your own machine on
demand. Is it something everyone will see? Why haven't I seen it myself?
Is there something special about the setup of the people seeing the
problem (e.g. disks that take a short time to check, the speed of the
computer and the graphical splash settings)?

Once you know the cause you then have to come with an idea for the fix.
If someone presents you with a fix it is often quicker to look at it and
say it right than to come up with an idea from scratch. Once you've done
that you have to test the fix to make sure it doesn't cause any new
problems (sometimes this leads to the fix being split into two). But who
(else) wants to do testing and risk breaking their system? Somehow we
need more programmers, testers and community liaisons because these can
be thankless tasks. Whatever you do, it all takes extra time and carries
risks...

You also raised some good general issues too. This is a long bug (it
affects many people and attracts a lot of comments because people want
to help whichever way they can). The thing is it's hard to know which
comments to show people. Often when I am searching for bugs I need to
see all the comments to be able to find the one I need but this is
clearly not the general case... Perhaps you could file a new bug
explaining this and how it could be improved (perhaps comment voting? I
don't have a good suggestion there :) ). You can use this link
https://bugs.launchpad.net/malone/+filebug .

A further issue as you've pointed out is the Fix Committed -> Fix
Released wording. You are not the first to have this issue. Again
perhaps you could file a new bug on that (if there isn't one already)? I
don't think it's fair to ask for dates (unless I'm paying whoever is
going to fix it vast sums of money) as the ETA to a fix is often like
asking "how long is a piece of string?". But there's bound to be room
for improving the clarity and indicating what stage a bug is in and
providing easier to read information about it for new people.

In the future I'll try and post less to popular bugs so they are more
manageable. Thanks for your comment I hope you succeed in helping
launchpad become clearer tool to use!

-- 
fsck progress stalls at boot, plymouthd/mountall eats CPU
https://bugs.launchpad.net/bugs/571707
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is a direct subscriber.

Status in “mountall” package in Ubuntu: Fix Released
Status in “plymouth” package in Ubuntu: Triaged
Status in “mountall” source package in Lucid: Fix Committed
Status in “plymouth” source package in Lucid: Triaged

Bug description:
PROBLEM

When a disk check is performed, the progress stalls somewhere around 70% and will then take a very long time finishing the remaining percent (10 minutes or more).

PATCH

Patch for mountall has now been accepted into -proposed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Tero Mononen has published a patch for Bug #553745 which applies to the issue described here as well (see https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/553745/comments/76 and https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/553745/comments/77 )

I have created corresponding packages which are available through my PPA: https://launchpad.net/~arand/+archive/unstable

!!!Do note that this is an unofficial, untested, preliminary patch!!!
However testing and feedback is welcome, please especially report if there are ANY (new) problems seen when using the patched version.

TEST CASE:

(sudo aptitude install bootchart)
sudo touch /forcefsck && sudo reboot

POSSIBLE TEMPORARY WORKAROUNDS

1. Removing "quiet" and "splash" from the kernel boot line

2. When the progress has stalled, switch away from the splash screen using the left arrowkey (presumably any arrowkey works).

* Both these approaches speeds up the boot process to ~1 minute instead.

OBSERVATIONS

The fsck message "(...) non-contiguous (...)" Which I assume indicates the end of the fsck, is printed in the Virtual Terminal ("outside" plymouth) at around 70% + ~10-20 seconds.

Disk activity is null from this point on (presumed end of fsck above).

Bootchart crashes if trying to catch the whole boot at once with plymouth (at least for my 1h boot).

This problem seems to occur in both plymouthd and mountall, semi-simultaneously:
If you are in the plymouth screen, plymouthd is the cpu-gobbler, if you switch away from it using the arrow keys, mountall instead takes over the cpu-eating.

#####

ORIGINAL REPORT

Binary package hint: mountall

On my system when fsck runs at boot plymouth % completion count goes up quickly (<10 seconds) up to about 80% and then slows down considerably: the complete fsck of my 125GB HD, 30% full takes more than 5 minutes.

While this goes on the text VTs are all completely blank: just a blinking cursor.

An fsck from a recovery disk completes in ~10 seconds so it doesn't look like "fsck just being slow".

This slowdown was *not* happening on 2010-04-14 with the PPA described by this comment: https://bugs.launchpad.net/ubuntu/+source/plymouth/+bug/554737/comments/25

The fix in the PPA is now in the mainline lucid but somewhere in between then and today (2010-04-29) something introduced this slowdown.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: mountall 2.14
ProcVersionSignature: Ubuntu 2.6.32-21.32-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-21-generic i686
NonfreeKernelModules: wl
Architecture: i386
Date: Thu Apr 29 15:38:56 2010
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Beta i386 (20100317.1)
ProcEnviron:
 LANGUAGE=en_IE:en_GB:en
 PATH=(custom, user)
 LANG=en_IE.utf8
 SHELL=/bin/bash
SourcePackage: mountall







More information about the Ubuntu-sponsors mailing list