DANGER!!! Problems with 10.04 installer (RAID devices *will* get corrupted)
Karl Larsen
klarsen1 at gmail.com
Wed Apr 21 11:58:00 UTC 2010
On 04/20/2010 10:30 PM, Alvin Thompson wrote:
> Long story short: the only way to be safe right now is to physically
> remove drives with important data during the install.
>
> I figured out the cause of my RAID problems, and it's a problem with
> ubuntu's installer. This will cost people their data if not fixed.
> Sorry about the length of this post, but the problem takes a while to
> explain.
>
> The following scenario is not the only way your partitions can get
> hosed. I simply use it because it's a common use case, it illustrates
> what data is where on the hard drives, and it exposes the flaws in the
> installer's logic. It also doesn't matter if you don't touch a
> particular drive, partition, or file system during the install. The
> data on it can still be corrupted.
>
> Suppose you have a hard drive with some partitions on it. On one of
> those partitions you have a linux file system which houses your data.
> We'll say for the sake of this discussion that sda2 contains an EXT4
> file system with your data. So far, so good.
>
> Because this data is too important to rely on a single drive, you decide
> to buy some more drives and make a RAID 5 device. You buy 3 more drives
> and create similar partitions an them (say, sdb2, sdc2, and sdd2). You
> copy the data currently on sda2 somewhere safe, then you use mdadm to
> create a RAID5 array with sda2, sdb2, sdc2, and sdd2. The new RAID
> device is md0. You create an XFS file system on md0 and move your data
> to it*. This is all perfectly fine, but the stage has been set for
> disaster with the ubuntu installer.
>
> Later, you decide to do a clean install of ubuntu on sda1 (sda1 is *not*
> part of the RAID array), and you get to the partitioning stage and
> select manual partitioning. This is where things get really ugly really
> fast.
>
> The bug is how the installer detects existing file systems. It simply
> reads the raw data in a partition to see if the bits it finds correspond
> to a known file system. In the above example, the installer detects the
> remnants of the original (non-RAID) file system on sda2 and thinks it's
> a current EXT4 file system. Even if you use fdisk to mark sda2's
> partition type as 'RAID autodetect' instead of 'linux' (which is no
> longer necessary), the installer still detects the partition as having
> an EXT4 file system.
>
> Once this 'ghost' file system is detected, the installer gets really
> confused about what goes where and will try to write to sda2 during the
> install, even if you told the installer to ignore sda2 and just install
> to sda1. This corrupts the current XFS file system on md0, and you're
> screwed.
>
> The overall flaw here is in the file system detection; you can't just
> assume that any sequence of bits you find sitting around on a hard drive
> are still current.
>
> A possible solution may be to first check for a RAID superblock, and if
> found that trumps all file system detection. I imagine something
> similar will have to be done with partitions that are part of an LVM
> volume as well.
>
> -Alvin
>
> * In my case, I took a shortcut and created a degraded array (missing
> sda2), copied the data from sda2 to the array, added sda2 to the array,
> and resynched. I don't think it makes a difference.
>
>
This is not a bug I think. You had just changed from a standard
single hard drive to a raid system because your data is so important.
Let me suggest this:
1. Go back to one hard drive.
2. Back up your important data. I use rsync and it works fine.
3. Now load 10.04 and it should be fine.
4. Your raid5 problems are typical.
73 Karl
--
Karl F. Larsen, AKA K5DI
Linux User
#450462 http://counter.li.org.
Key ID = 3951B48D
More information about the ubuntu-users
mailing list