Rant about 10.04's installer on ubuntu-users

Wed Apr 21 20:41:00 UTC 2010

https://lists.ubuntu.com/archives/ubuntu-users/2010-April/216013.html

Since the person who posted does not want to file a bug report, I
thought that i should point this out.

>From the link above:

Long story short: the only way to be safe right now is to physically
remove drives with important data during the install.

I figured out the cause of my RAID problems, and it's a problem with
ubuntu's installer. This will cost people their data if not fixed.
Sorry about the length of this post, but the problem takes a while to
explain.

The following scenario is not the only way your partitions can get
hosed. I simply use it because it's a common use case, it illustrates
what data is where on the hard drives, and it exposes the flaws in the
installer's logic. It also doesn't matter if you don't touch a
particular drive, partition, or file system during the install. The
data on it can still be corrupted.

Suppose you have a hard drive with some partitions on it. On one of
those partitions you have a linux file system which houses your data.
We'll say for the sake of this discussion that sda2 contains an EXT4
file system with your data. So far, so good.

Because this data is too important to rely on a single drive, you decide
to buy some more drives and make a RAID 5 device. You buy 3 more drives
and create similar partitions an them (say, sdb2, sdc2, and sdd2). You
copy the data currently on sda2 somewhere safe, then you use mdadm to
create a RAID5 array with sda2, sdb2, sdc2, and sdd2. The new RAID
device is md0. You create an XFS file system on md0 and move your data
to it*. This is all perfectly fine, but the stage has been set for
disaster with the ubuntu installer.

Later, you decide to do a clean install of ubuntu on sda1 (sda1 is *not*
part of the RAID array), and you get to the partitioning stage and
select manual partitioning. This is where things get really ugly really
fast.

The bug is how the installer detects existing file systems. It simply
reads the raw data in a partition to see if the bits it finds correspond
to a known file system. In the above example, the installer detects the
remnants of the original (non-RAID) file system on sda2 and thinks it's
a current EXT4 file system. Even if you use fdisk to mark sda2's
partition type as 'RAID autodetect' instead of 'linux' (which is no
longer necessary), the installer still detects the partition as having
an EXT4 file system.

Once this 'ghost' file system is detected, the installer gets really
confused about what goes where and will try to write to sda2 during the
install, even if you told the installer to ignore sda2 and just install
to sda1. This corrupts the current XFS file system on md0, and you're
screwed.

The overall flaw here is in the file system detection; you can't just
assume that any sequence of bits you find sitting around on a hard drive
are still current.

A possible solution may be to first check for a RAID superblock, and if
found that trumps all file system detection. I imagine something
similar will have to be done with partitions that are part of an LVM
volume as well.

-Alvin

* In my case, I took a shortcut and created a degraded array (missing
sda2), copied the data from sda2 to the array, added sda2 to the array,
and resynched. I don't think it makes a difference.