Slower performance with ext4
Chan Chung Hang Christopher
christopher.chan at bradbury.edu.hk
Mon Nov 2 13:49:09 UTC 2009
Mark Kirkwood wrote:
> Christopher Chan wrote:
>
>> Mark Kirkwood wrote:
>>
>>
>>> Christopher Chan wrote:
>>>
>>>
>>>
>>>> Journaling only for metadata is not 'as much journaling as any other
>>>> canditates.' You cannot say metadata journaling only as equivalent to
>>>> the data and metadata journaling that is possible with ext3. XFS's
>>>> journaling only provides filesystem metadata consistency which is why
>>>> you get files full of NULLs after a crash/power out. MTAs rely on fsync
>>>> calls and how a filesystem behaves in regards to fsync requests is the
>>>> real determiner of whether there is a data guarantee or not. XFS does
>>>> not provide data guarantee. It, at best, provides a metadata guarantee.
>>>> XFS should not be used for mta queues unless it is in conjunction with
>>>> hardware raid that has a bbu cache. XFS is best suited for streaming
>>>> applications where the data loss is tolerated.
>>>>
>>>>
>>>>
>>>>
>>>>
>>> Sorry, but that is completely incorrect. Applications that use fsync are
>>> safe with any filesystem - fsync forces the modified buffers to *disk*,
>>> so all discussions about os and filesystem caching are irrelivant[1].
>>>
>>>
>>>
>> Yes...where *disk* = journal. Which for JFS, XFS and ext3 data=ordered
>> means metadata only. Only ext3 data=journal guarantees data and
>> metadata. Feel free to get (whoever filesystem developer) to confirm for
>> me because you won't get any other answer than what I have just posted.
>>
>>
>>
> Not so - disk != journal. At fsync the buffers are written through the
> os buffer to the physical disk cache, and the cache is instructed to
> write 'em to the rotating media. This is for data, *not* always metadata
> (see man for fsync vs fdatasync). In fact it is the metadata that has
> historically caused the most problems - hence the need to journal this.
>
> The vast majority of the world databases and mail servers depend on the
> fact that fsync forces modified *data* buffers to their respective file
> on disk.
>
Maybe things have changed for XFS now but for ext3, disk = journal.
http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L71
When data=journal, data and metadata for file are written to the journal
and then fsync returns. End of story.
When data=ordered, when metadata is written via sync_inode(), fsync
returns and you hope nothing happens within the next half second if you
want data consistency too.
Hence the reason why a ext3 filesystem on software raid but mounted
data=journal and with an external journal on a bbu nvram card will blow
away other filesystems in performance and data consistency.
Comments for your pleasure:
53 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L53> *//*/*
54 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L54> */ * data=writeback:/*
55 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L55> */ * The caller's filemap_fdatawrite()/wait will sync the data./*
56 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L56> */ * sync_inode() will sync the metadata/*
57 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L57> */ */*
58 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L58> */ * data=ordered:/*
59 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L59> */ * The caller's filemap_fdatawrite() will write the data and/*
60 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L60> */ * sync_inode() will write the inode if it is dirty. Then the caller's/*
61 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L61> */ * filemap_fdatawait() will wait on the pages./*
62 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L62> */ */*
63 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L63> */ * data=journal:/*
64 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L64> */ * filemap_fdatawrite won't do anything (the buffers are clean)./*
65 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L65> */ * ext3_force_commit will write the file data into the journal and/*
66 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L66> */ * will wait on that./*
67 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L67> */ * filemap_fdatawait() will encounter a ton of newly-dirtied pages/*
68 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L68> */ * (they were dirtied by commit). But that's OK - the blocks are/*
69 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L69> */ * safe in-journal, which is all fsync() needs to ensure./*
70 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L70> */ *//*
More information about the ubuntu-users
mailing list