[Bug 37435] Re: XFS leaves garbage in file if app does write-new-then-rename without f(data)sync

Bruno Rocha Coutinho bruno.r.coutinho at gmail.com
Mon Jun 16 14:57:41 UTC 2008

This is not a bug. It is a feature. :-)

A write in XFS imediatelly grows the file and if a crash occurs between
the write and data being flushed to disk the extra space if filled with
zeroes. This is explained this text from
http://madduck.net/blog/2006.08.11:xfs-zeroes/ pasted below:

This must be the most misunderstood feature of XFS. What happens is that XFS logs all metadata changes to the journal, except for the inode size, which gets flushed to disk immediately for performance reasons [*]_. At this point, the file will actually be a sparse file, which is nothing more than a file whose metadata lists a file as being of a size different than it currently is (I realise the “sparse” does not really apply when the file is “overfull”, i.e. when the metadata lists it as smaller than it really is, but I am lacking a good word for that). The disk extents get allocated only when the data actually hits the disk (that’s XFS’s famous delayed allocation mechanism). If the power fails before the data was flushed to disk and the journal entry cleared, XFS will serve zeroes, rather than the potentially random or sensitive data that is actually on disk. This is a good thing.

.. [] sincealmost every* write() changes the file size, it would be : a
massive performance hit if every size change was logged. However, XFS
actually violates its own journaling rules by doing this.

You can run into more or less the same problem with any journaling
filesystem; the others just don’t serve zeroes. Instead, they give you
the data that’s physically on the medium. Imagine the situation when the
corrupt /etc/motd suddenly becomes a window to your previous /etc/shadow
contents… I really prefer how XFS handles that. Sometimes you do get the
old data back with the other filesystems, but this is because the
filesystems may reuse the blocks of the old file. So it’s a trade-off,
and your choice between security and, uh, convenience.

The only way to protect against this is to use “physical-block
journaling” (as opposed to “logical journaling”), which is only
supported by ext3 as far as I know (option data=journal), at a massive
performance loss. See this mailing list post
(http://linuxmafia.com/faq/Filesystems/reiserfs.html) by Theodore Ts’o
for more info.

XFS leaves garbage in file if app does write-new-then-rename without f(data)sync
You received this bug notification because you are a member of Kernel
Bugs, which is a direct subscriber.

More information about the kernel-bugs mailing list