[Bug 1443735] Re: recordfail false positive causes headless servers to hang on boot by default
Robie Basak
1443735 at bugs.launchpad.net
Tue May 19 13:29:12 UTC 2015
** Description changed:
+ [Impact]
+
+ On a headless server system, a user who does not have easy access to the
+ console may find the system fails to come up after a power cut because
+ the boot is blocked on a console menu prompt from grub that does not
+ time out.
+
+ [Workaround]
+
+ Set GRUB_RECORDFAIL_TIMEOUT to some positive value (eg. 30) in
+ /etc/default/grub and then run "sudo update-grub". However this needs to
+ have been done before the problem occurs; when it has occurred, the only
+ option a user has is to add a head to a headless system.
+
+ [Development Fix]
+
+ Default for GRUB_RECORDFAIL_TIMEOUT changed from -1 (indefinite wait) to
+ 30 (proceed anyway after 30 seconds). Accepted in Debian, synced to
+ Ubuntu in Wily; currently held in wily-proposed due to some items in the
+ unapproved queue.
+
+ [Stable Fix]
+
+ Same as development fix.
+
+ [Regression Potential]
+
+ This fix changes user-visible behaviour deliberately because the
+ previous behaviour led to this bug. Users of non-headless systems (eg.
+ desktop) may miss the boot menu and come back to a failed boot or
+ something; but if they attempt again, they should see the menu prompt
+ for 30 seconds anyway.
+
+ [Test Case]
+
Steps to reproduce:
1. Boot a Vivid system installed from the server installer (not a cloud image).
2. Kill the power (or VM) while the kernel is initialising but before it has started init.
3. Power up the system (or start the VM) again.
Expected behaviour: the system should boot without user intervention.
Actual behaviour: the system hangs on the grub prompt.
+
+ [Details]
This was previously raised in bug 669481 but the solution applied then
was just to add the GRUB_RECORDFAIL_TIMEOUT setting defaulted to -1.
This allowed users to work around the problem by tuning
GRUB_RECORDFAIL_TIMEOUT. I'm filing this bug separately as there is
nothing wrong with the previous fix, but it didn't fix the problem for
users by default. This bug is about fixing the default so that users
don't have to discover and work around the issue.
An IRC discussion (http://irclogs.ubuntu.com/2015/02/27/%23ubuntu-
devel.html#t13:54) concluded that everyone involved in the discussion is
happy to change the timeout from infinity to 30s.
Colin asked for a fix in Debian, so I'll send a patch there and add a
bug link. I'm also filing the bug here in order to track the fix in both
Debian and Ubuntu.
Importance: High because of the impact to users on headless servers -
from their perspective, this causes a system to fail to boot after an
appropriately timed double power cut. I'm prompted to do this today
because it just happened to me on my server, so perhaps it's more likely
than I originally thought.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1443735
Title:
recordfail false positive causes headless servers to hang on boot by
default
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1443735/+subscriptions
More information about the Ubuntu-server-bugs
mailing list