[Bug 324997] Re: System startup fails with degraded RAID + encryption
Gareth
donotspam at fastmail.fm
Thu Jun 18 04:59:22 UTC 2015
Still a problem on 14.04.2 - a workaround I stumbled upon here FYI but don't know if it works:
https://feeding.cloud.geek.nz/posts/the-perils-of-raid-and-full-disk-encryption-on-ubuntu/
Any chance of a fix?
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/324997
Title:
System startup fails with degraded RAID + encryption
Status in mdadm package in Ubuntu:
New
Bug description:
Binary package hint: mdadm
Release: Ubuntu 8.04.2, from alternate installation media. Probably
also affects Ubuntu 8.10.
Module: mdadm. Possibly affects others as well.
Version: 2.6.3. Probably also affects 2.6.7, which is used in 8.10.
Situation: I have installed \root and \swap to RAID 1 partitions with
encryption and LVM. When one of the RAID disks is missing from the
system, a prompt should ask whether I want to start with a degraded
RAID setup or not. Answering Yes should start up the system.
What happens: This depends on whether "quiet splash" boot options are
in use or not, as follows:
- When "quiet splash" is used, a prompt for LUKS encryption password appears normally, but fails again and again even if the user types the correct password. At this point the user probably thinks that the password is not correct, or that there's something wrong in encryption. Both assumptions are incorrect, as in fact there is nothing wrong with encryption nor the password... Nevertheless, the system fails to start. By pressing CTRL+ALT+F1, one can see these messages on screen, which also lead to wrong direction:
Starting up ...
Loading, please wait...
Setting up cryptographic volume md1_crypt (based on /dev/md1)
cryptsetup: cryptsetup failed, bad password or options?
cryptsetup: cryptsetup failed, bad password or options?
- Without "quiet splash" boot options, the user can observe system messages on screen during boot process. At a certain point, the missing RAID disk causes a long wait. During this wait, one can see these messages on screen:
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
... other stuff ...
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
cryptsetup: maximum number of tries exceeded
Done.
Begin: Waiting for root file system... ...
After a few minutes, the system prompts whether the user wants to start with a degraded RAID setup. After answering Yes and another few minutes of waiting, the system presents the command line (Busybox), ie. fails to start. This happens probably because \root is encrypted and needs to be opened with a password, however the password prompt already failed during the long RAID wait period.
Note that it is possible to start the system from Busybox by typing
"cryptsetup luksOpen /dev/md1 md1_crypt", then typing LUKS password,
and finally pressing CTRL+D. This proves that the encryption works and
the problem is related to degraded RAID, handled by mdadm. Also the
system starts properly when all RAID disks are present; the problem
only appears with a degraded RAID.
More information:
For a long time, it was not possible to properly boot with a degraded RAID setup. This bug was never present in Debian, only in Ubuntu. See bug 120375. A solution was presented in 8.10, which was recently backported to 8.04.2. See bug 290885. Apparently, encryption was not dealt with the fix, and thus this bug likely affects all recent Ubuntu releases, including 8.10.
I consider this quite important to be fixed, as many RAID 1 users also
need to use encryption for protecting their data, especially in the
business world. Although there is a workaround for a starting the
system with degraded RAID and encryption, the error messages clearly
lead to wrong direction (encryption is not the problem), and even if
the user knows what's going on it is too complicated to look up the
workaround by googling in a stressing situation where a production
server fails to start.
Suggestions:
- When usplash is used, there should be a notification about a possible degraded RAID array *before* LUKS encryption password is prompted, so that the user has a possibility to figure out that the failure to boot is related to RAID disks, not encryption
- When a degraded RAID is observed, LUKS password should actually be prompted *after* answering the question for starting the system with degraded RAID
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/324997/+subscriptions
More information about the foundations-bugs
mailing list