Slower performance with ext4

Mon Nov 2 06:51:41 UTC 2009

Mark Kirkwood wrote:
> Christopher Chan wrote:
>   
>> Mark Kirkwood wrote:
>>   
>>     
>>> Christopher Chan wrote:
>>>   
>>>     
>>>       
>>>> Journaling only for metadata is not 'as much journaling as any other 
>>>> canditates.' You cannot say metadata journaling only as equivalent to 
>>>> the data and metadata journaling that is possible with ext3. XFS's 
>>>> journaling only provides filesystem metadata consistency which is why 
>>>> you get files full of NULLs after a crash/power out. MTAs rely on fsync 
>>>> calls and how a filesystem behaves in regards to fsync requests is the 
>>>> real determiner of whether there is a data guarantee or not. XFS does 
>>>> not provide data guarantee. It, at best, provides a metadata guarantee. 
>>>> XFS should not be used for mta queues unless it is in conjunction with 
>>>> hardware raid that has a bbu cache. XFS is best suited for streaming 
>>>> applications where the data loss is tolerated.
>>>>
>>>>   
>>>>     
>>>>       
>>>>         
>>> Sorry, but that is completely incorrect. Applications that use fsync are 
>>> safe with any filesystem - fsync forces the modified buffers to *disk*, 
>>> so all discussions about os and filesystem caching are irrelivant[1].
>>>   
>>>     
>>>       
>> Yes...where *disk* = journal. Which for JFS, XFS and ext3 data=ordered 
>> means metadata only. Only ext3 data=journal guarantees data and 
>> metadata. Feel free to get (whoever filesystem developer) to confirm for 
>> me because you won't get any other answer than what I have just posted.
>>
>>   
>>     
> Not so - disk != journal. At fsync the buffers are written through the 
> os buffer to the physical disk cache, and the cache is instructed to 
> write 'em to the rotating media. This is for data, *not* always metadata 
> (see man for fsync vs fdatasync). In fact it is the metadata that has 
> historically caused the most problems - hence the need to journal this.
>   

Maybe you want to first VERIFY with the various filesystem developers 
before you start yapping what appears to be the only sensible 
explanation but is in fact a myth. On Linux, XFS, JFS and ext3 
data=ordered return fsync as soon as the metadata hits the journal on 
disk and before the data is commited to its location on the filesystem 
and metadata is committed to its location. ext3 data=journal returns 
after both data and metadata is committed to disk on the JOURNAL and 
before they are written to their locations in the filesystem. I have not 
yet looked at ext4 so I will not say anything about what it does.

> The vast majority of the world databases and mail servers depend on the 
> fact that fsync forces modified *data* buffers to their respective file 
> on disk.
>   

Sure. Too bad that is not always true.

> The zero length files that people dislike so much on xfs are caused by 
> applications that do *not* request an fsync - and also cheap sata disks 
> that do not honor fsync's request to actually write the buffers... 
> thankfully these are less common now (especially for serious sata drives 
> like WD's Velociraptor).
>
>   

Heh, what do you know? I have been burned by XFS after a powerloss and 
got over 4000 zero length files in a postfix queue. No filesystem 
corruption, just zero data files. You want to tell me that postfix does 
not use fsync? You can guess what I did to the XFS filesystem mounted 
for the queue directory. I destroyed it and got ext3 instead in full 
data journal mode. Which I repeated on all the other mtas that had a XFS 
filesystem for their mail queue.