(fwd) BUG: mmapfile/writev spurious zero bytes still in the wild

Bron Gondwana brong at fastmail.fm
Sat Oct 4 00:30:06 UTC 2008


I should have just added you as CC.  I've tried reporting this via
launchpad, but it just keeps timing out on me again :(

I'm also CCing the user who reported the problem to me.  He's running
2.6.14-19-xen, but the problem exists in all 64 bit kernels in Ubuntu
if it exists in one.  This causes massive data corruption to Cyrus
running on kernels between 2.6.23 and 2.6.25.8.

You can read the full backthread here:

http://lkml.org/lkml/2008/6/17/9

Linus posted the patch here:

http://lkml.org/lkml/2008/6/17/234

And the fix was rolled into 2.6.25.8:

commit 864f24395c72b6a6c48d13f409f986dc71a5cf4a
Author: Linus Torvalds <torvalds at linux-foundation.org>
Date:   Tue Jun 17 17:47:50 2008 -0700

    x86-64: Fix "bytes left to copy" return value for copy_from_user()
    
    commit 42a886af728c089df8da1b0017b0e7e6c81b5335 upstream
    
    Most users by far do not care about the exact return value (they only
    really care about whether the copy succeeded in its entirety or not),
    but a few special core routines actually care deeply about exactly how
    many bytes were copied from user space.
    
    And the unrolled versions of the x86-64 user copy routines would
    sometimes report that it had copied more bytes than it actually had.
    
    Very few uses actually have partial copies to begin with, but to make
    this bug even harder to trigger, most x86 CPU's use the "rep string"
    instructions for normal user copies, and that version didn't have this
    issue.
    
    To make it even harder to hit, the one user of this that really cared
    about the return value (and used the uncached version of the copy that
    doesn't use the "rep string" instructions) was the generic write
    routine, which pre-populated its source, once more hiding the problem by
    avoiding the exception case that triggers the bug.
    
    In other words, very special thanks to Bron Gondwana who not only
    triggered this, but created a test-program to show it, and bisected the
    behavior down to commit 08291429cfa6258c4cd95d8833beb40f828b194e ("mm:
    fix pagecache write deadlocks") which changed the access pattern just
    enough that you can now trigger it with 'writev()' with multiple
    iovec's.
    
    That commit itself was not the cause of the bug, it just allowed all the
    stars to align just right that you could trigger the problem.
    
    [ Side note: this is just the minimal fix to make the copy routines
      (with __copy_from_user_inatomic_nocache as the particular version that
      was involved in showing this) have the right return values.
    
      We really should improve on the exceptional case further - to make the
      copy do a byte-accurate copy up to the exact page limit that causes it
      to fail.  As it is, the callers have to do extra work to handle the
      limit case gracefully. ]
    
    Reported-by: Bron Gondwana <brong at fastmail.fm>
    Cc: Nick Piggin <npiggin at suse.de>
    Cc: Andrew Morton <akpm at linux-foundation.org>
    Cc: Andi Kleen <andi at firstfloor.org>
    Cc: Al Viro <viro at ZenIV.linux.org.uk>
    Acked-by: Ingo Molnar <mingo at elte.hu>
    Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh at suse.de>


----- Forwarded message from Bron Gondwana <brong at fastmail.fm> -----

Date: Fri, 3 Oct 2008 21:44:14 +1000
From: Bron Gondwana <brong at fastmail.fm>
Subject: BUG: mmapfile/writev spurious zero bytes still in the wild
Cc: Bron Gondwana <brong at fastmail.fm>,
	Linux Kernel Mailing List <linux-kernel at vger.kernel.org>,
	Nick Piggin <npiggin at suse.de>,
	Andrew Morton <akpm at linux-foundation.org>,
	Rob Mueller <robm at fastmail.fm>, Andi Kleen <andi at firstfloor.org>,
	Ingo Molnar <mingo at elte.hu>
To: Linus Torvalds <torvalds at linux-foundation.org>
Organization: brong.net

On Tue, Jun 17, 2008 at 02:20:49PM -0700, Linus Torvalds wrote:

[reminder from way back: this bug was caused by writev containing 
mmaped pages that weren't paged in, it's 64 bit only.  It
particularly affects Cyrus Imapd's database formats]

> On Tue, 17 Jun 2008, Linus Torvalds wrote:
> > 
> > Hmm. Something like this *may* salvage it.
> > 
> > Untested, so far (I'll reboot and test soon enough), but even if it fixes 
> > things, it's not really very good. 
> 
> Ok, so I just rebooted with this, and it does indeed fix the bug.
> 
> I'd be happier with a more complete fix (ie being byte-accurate and 
> actually doing the partial copy when it hits a fault in the middle), but 
> this seems to be the minimal fix, and at least fixes the totally bogus 
> return values from the x86-64 __copy_user*() functions.

Has this been revisited since?  I haven't noticed, but I really only
skim LKML - have to save some time in the day for my real job[tm] of
keeping an email service running!

> Not that I checked that I got _all_ cases correct (and maybe there are 
> other versions of __copy_user that I missed entirely), but Bron's 
> test-case at least seems to work properly for me now.
> 
> Bron? If you have a more complete test-suite (ie the real-world case that 
> made you find this), it would be good to verify the whole thing. 

It's been fine for us since, but unfortunately most of the world is
still running distribution "stable" kernels.  I've just been helping a
user who's getting corrupted flat file databases on Ubuntu's stable 64
bit xen kernels, and it looks like it's the same issue.

Is there a standard way to tell backporters "you really need to add this
patch for your users' sanity"?

Bron ( I tried reporting it in Launchpad, but kept getting timeout
       errors, so I figured reposting here might get noticed.  Besides,
       I can follow up on the "more complete fix" at the same time! )

----- End forwarded message -----




More information about the kernel-team mailing list