Slower performance with ext4

Chan Chung Hang Christopher christopher.chan at bradbury.edu.hk
Mon Nov 2 13:49:09 UTC 2009


Mark Kirkwood wrote:
> Christopher Chan wrote:
>   
>> Mark Kirkwood wrote:
>>   
>>     
>>> Christopher Chan wrote:
>>>   
>>>     
>>>       
>>>> Journaling only for metadata is not 'as much journaling as any other 
>>>> canditates.' You cannot say metadata journaling only as equivalent to 
>>>> the data and metadata journaling that is possible with ext3. XFS's 
>>>> journaling only provides filesystem metadata consistency which is why 
>>>> you get files full of NULLs after a crash/power out. MTAs rely on fsync 
>>>> calls and how a filesystem behaves in regards to fsync requests is the 
>>>> real determiner of whether there is a data guarantee or not. XFS does 
>>>> not provide data guarantee. It, at best, provides a metadata guarantee. 
>>>> XFS should not be used for mta queues unless it is in conjunction with 
>>>> hardware raid that has a bbu cache. XFS is best suited for streaming 
>>>> applications where the data loss is tolerated.
>>>>
>>>>   
>>>>     
>>>>       
>>>>         
>>> Sorry, but that is completely incorrect. Applications that use fsync are 
>>> safe with any filesystem - fsync forces the modified buffers to *disk*, 
>>> so all discussions about os and filesystem caching are irrelivant[1].
>>>   
>>>     
>>>       
>> Yes...where *disk* = journal. Which for JFS, XFS and ext3 data=ordered 
>> means metadata only. Only ext3 data=journal guarantees data and 
>> metadata. Feel free to get (whoever filesystem developer) to confirm for 
>> me because you won't get any other answer than what I have just posted.
>>
>>   
>>     
> Not so - disk != journal. At fsync the buffers are written through the 
> os buffer to the physical disk cache, and the cache is instructed to 
> write 'em to the rotating media. This is for data, *not* always metadata 
> (see man for fsync vs fdatasync). In fact it is the metadata that has 
> historically caused the most problems - hence the need to journal this.
>
> The vast majority of the world databases and mail servers depend on the 
> fact that fsync forces modified *data* buffers to their respective file 
> on disk.
>   


Maybe things have changed for XFS now but for ext3, disk = journal.

http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L71

When data=journal, data and metadata for file are written to the journal 
and then fsync returns. End of story.

When data=ordered, when metadata is written via sync_inode(), fsync 
returns and you hope nothing happens within the next half second if you 
want data consistency too.

Hence the reason why a ext3 filesystem on software raid but mounted 
data=journal and with an external journal on a bbu nvram card will blow 
away other filesystems in performance and data consistency.

Comments for your pleasure:

 53 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L53>         *//*/*
 54 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L54> */         * data=writeback:/*
 55 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L55> */         *  The caller's filemap_fdatawrite()/wait will sync the data./*
 56 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L56> */         *  sync_inode() will sync the metadata/*
 57 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L57> */         */*
 58 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L58> */         * data=ordered:/*
 59 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L59> */         *  The caller's filemap_fdatawrite() will write the data and/*
 60 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L60> */         *  sync_inode() will write the inode if it is dirty.  Then the caller's/*
 61 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L61> */         *  filemap_fdatawait() will wait on the pages./*
 62 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L62> */         */*
 63 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L63> */         * data=journal:/*
 64 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L64> */         *  filemap_fdatawrite won't do anything (the buffers are clean)./*
 65 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L65> */         *  ext3_force_commit will write the file data into the journal and/*
 66 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L66> */         *  will wait on that./*
 67 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L67> */         *  filemap_fdatawait() will encounter a ton of newly-dirtied pages/*
 68 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L68> */         *  (they were dirtied by commit).  But that's OK - the blocks are/*
 69 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L69> */         *  safe in-journal, which is all fsync() needs to ensure./*
 70 <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L70> */         *//*






More information about the ubuntu-users mailing list