DANGER!!! Problems with 10.04 installer (RAID devices *will* get corrupted)

Alvin Thompson alvin at thompsonlogic.com
Wed Apr 21 04:30:25 UTC 2010


Long story short: the only way to be safe right now is to physically 
remove drives with important data during the install.

I figured out the cause of my RAID problems, and it's a problem with 
ubuntu's installer.  This will cost people their data if not fixed. 
Sorry about the length of this post, but the problem takes a while to 
explain.

The following scenario is not the only way your partitions can get 
hosed.  I simply use it because it's a common use case, it illustrates 
what data is where on the hard drives, and it exposes the flaws in the 
installer's logic.  It also doesn't matter if you don't touch a 
particular drive, partition, or file system during the install.  The 
data on it can still be corrupted.

Suppose you have a hard drive with some partitions on it.  On one of 
those partitions you have a linux file system which houses your data. 
We'll say for the sake of this discussion that sda2 contains an EXT4 
file system with your data.  So far, so good.

Because this data is too important to rely on a single drive, you decide 
to buy some more drives and make a RAID 5 device.  You buy 3 more drives 
and create similar partitions an them (say, sdb2, sdc2, and sdd2).  You 
copy the data currently on sda2 somewhere safe, then you use mdadm to 
create a RAID5 array with sda2, sdb2, sdc2, and sdd2.  The new RAID 
device is md0.  You create an XFS file system on md0 and move your data 
to it*.  This is all perfectly fine, but the stage has been set for 
disaster with the ubuntu installer.

Later, you decide to do a clean install of ubuntu on sda1 (sda1 is *not* 
part of the RAID array), and you get to the partitioning stage and 
select manual partitioning.  This is where things get really ugly really 
fast.

The bug is how the installer detects existing file systems.  It simply 
reads the raw data in a partition to see if the bits it finds correspond 
to a known file system.  In the above example, the installer detects the 
remnants of the original (non-RAID) file system on sda2 and thinks it's 
a current EXT4 file system.  Even if you use fdisk to mark sda2's 
partition type as 'RAID autodetect' instead of 'linux' (which is no 
longer necessary), the installer still detects the partition as having 
an EXT4 file system.

Once this 'ghost' file system is detected, the installer gets really 
confused about what goes where and will try to write to sda2 during the 
install, even if you told the installer to ignore sda2 and just install 
to sda1.  This corrupts the current XFS file system on md0, and you're 
screwed.

The overall flaw here is in the file system detection; you can't just 
assume that any sequence of bits you find sitting around on a hard drive 
are still current.

A possible solution may be to first check for a RAID superblock, and if 
found that trumps all file system detection.  I imagine something 
similar will have to be done with partitions that are part of an LVM 
volume as well.

-Alvin

* In my case, I took a shortcut and created a degraded array (missing 
sda2), copied the data from sda2 to the array, added sda2 to the array, 
and resynched.   I don't think it makes a difference.




More information about the ubuntu-users mailing list