Slower performance with ext4
Mark Kirkwood
markir at paradise.net.nz
Wed Nov 4 09:52:36 UTC 2009
Chan Chung Hang Christopher wrote:
>
>
> ROTFL. Nice OPTIMIZATION? One is possibly doing almost DOUBLE the
> writes. It is really only an optimization if you are using ext3
> data=journal for a mail queue and the journal is on a uber fast nvram
> card (memory speed versus disk speed) because most mails should not
> queue and if you have a nice big nvram card to act as a buffer and speed
> up response to fsync calls for other cases. Hence why most people use
> raid cards with nice big bbu caches nowadays. /me jumps up and down on a
> bunch of 3ware 75xx/85xx cards.
>
>
>>
>
> Not so fast pal. data=writeback issues a flush for data...and nothing
> else (goto flush ... out) and data=ordered issues a call that syncs the
> inode only. The only part where data buffers are synced is
> data=writeback (just like what others have explained about
> data=writeback) and there is no data buffer related call for data =
> ordered. Just an inode sync.
>
> However, I do have my doubts about the journal being used when
> data=ordered/writeback. I have not spent a lot of time but I cannot find
> where the inode sync call puts anything in the journal...the call is
> generic and not specific to ext3 too. It appears things have changed
> since barriers were introduced.
>
>>
Actually I think we have both misunderstood this point - because the
code we are looking at is not the whole story. How it works is that an
application calls fsync() , which will then call sys_fsync(), which will
(amongst other things) call:
- generic_block_fdatasync() to sync the *data* blocks
- ext3_sync_file() to sort out the metadata and journal stuff*/
/*
Note the comments in the links you posted actually mention this. We have
been looking at the latter code only in isolation. I think this article:
http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/ssd’s-journaling-and-noatimerelatime
discusses the business quite well: data=journal *does* write the data
twice! Once to the files themselves and once to the journal. However,
under spcialized circumstances this is still faster than the other
journal modes.
> Again, in Linux there ain't no signal to the disk write cache to flush.
> Either you turn it off or suffer the consequences. Did you miss the
> Notes at the end of the fsync (2) man page?
>
>
Exactly - that is precisely the point I was making previously. Note that
SCSI/SAS disks generally default to the write cache being *off* which
makes 'em safer choices for serious storage. Write cache *on* means you
are at the mercy of how good the barrier support is (not that great
generally it seems), no matter what journal options are used.
Now I think that our differing emphasis on data vs metadata is probably
due to you minding mail servers (lots of important metadata changes from
mew files etc) and me minding databases (typically no important metadata
changes - e.g innodb typically has everything in 3 files...but very
important data changes - e.g. transaction logs).
In your use case, it makes sense to use data=journal. In mine typically
it does not (note that a database transaction log functions like a
journal - a serially appended file of transactions - so
data=ordered,writeback or even xfs journaling etc is not only fine but
optimal [1])!
regards
Mark
[1] Or even ext2 in some cases.
More information about the ubuntu-users
mailing list