Reliable file systems

Jan Morén jan.moren at lucs.lu.se
Sat Nov 6 16:58:17 UTC 2004


lör 2004-11-06 klockan 16:25 +0000 skrev Benjamin Roe:
> I just had to hit the reset button during boot because of a hard lockup
> (due to athcool, I think). Anyway, I've always used XFS for my systems
> as it's been very reliable. However, this reboot seems to have
> completely killed the filesystem. I had to run xfs_repair to get the
> system to let me login, and even now half of /usr, most of apt's package
> lists and various other things are either in lost+found or just gone.
> 
> I've always thought the point of journalled file systems was to avoid
> this sort of thing. I was especially surprised as few of the files were
> even open at the time of the reboot - it was quite early in the boot
> sequence.

Well, journaling protects you from a lot of stuff happening _within_
normal filesystem operation. You are guaranteed never to have
half-finished writes, for example.

However, journaling does nothing for you if some lower part of the
system starts overwriting stuff on the disk beyond the control of the
filesystem itself. And yes, that can happen as part of a system fault,
flaky hardware, or a hard drive crash. To protect from stuff like that,
you need to first look into RAID solutions (with a focus on reliability,
not speed), and - if you're really paranoid - also into some kind of
delayed mirroring/backup system.

As a classical example, for an early DEC system (old minicomputers),
they implemented a disk head parking feature at power loss. The idea was
(and is today - all modern drives do this) that you'd use some of the
last juice to park the disk heads and thereby avoid any risk of a disk
crash. Turned out that in some conditions, if the disk was currently
writing a block of data, it would continue to write as the head moved
across the disk, effectively randomizing a spiral pattern of data across
the entire filesystem. No journaled filesystem could have helped that
bug.



> I can find benchmarks for Linux file systems, but nothing on their
> reliability. I've had this happen with Reiserfs and ext3, but had hoped
> that XFS would be better.

Again, there are some aspects of filesystem operation that are outside
the control of the filesystem itself. That said, ext3 is likely the
single most reliable journaled filesystem you are likely to find (as it
is in heavy use everywhere, and builds directly on ext2 which has many
years of use to shake any bugs out of it). In my humble view, Reiserfs
tends to lean a little too much towards features as opposed to stability
(for me, there is no such thing as too much stability when it comes to
filesystems).




-- 
Trust the Computer. The Computer is your friend.
 
Tel. (Japan) 090-3622 8920            Dr. Jan Morén (mr)
                                      Dept. of Cognitive Science
http://lucs.lu.se/people/jan.moren    Lund, Sweden





More information about the ubuntu-users mailing list