[ACK] [PATCH Onerici, Natty] UBUNTU: SAUCE: x86/paravirt: Partially revert "remove lazy mode in interrupts"

Andy Whitcroft apw at canonical.com
Wed Oct 19 13:36:41 UTC 2011


On Tue, Oct 18, 2011 at 06:05:44PM +0200, Stefan Bader wrote:
> From: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
> 
> which has git commit b8bcfe997e46150fedcc3f5b26b846400122fdd9.
> 
> The unintended consequence of removing the flushing of MMU
> updates when doing kmap_atomic (or kunmap_atomic) is that we can
> hit a dereference bug when processing a "fork()" under a heavy loaded
> machine. Specifically we can hit:
> 
> BUG: unable to handle kernel paging request at f573fc8c
> IP: [<c01abc54>] swap_count_continued+0x104/0x180
> *pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000
> Oops: 0000 [#1] SMP
> Modules linked in:
> Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
> EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3
> EIP is at swap_count_continued+0x104/0x180
> .. snip..
> Call Trace:
>  [<c01ac222>] ? __swap_duplicate+0xc2/0x160
>  [<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0
>  [<c01ac2e4>] ? swap_duplicate+0x14/0x40
>  [<c01a0a6b>] ? copy_pte_range+0x45b/0x500
>  [<c01a0ca5>] ? copy_page_range+0x195/0x200
>  [<c01328c6>] ? dup_mmap+0x1c6/0x2c0
>  [<c0132cf8>] ? dup_mm+0xa8/0x130
>  [<c013376a>] ? copy_process+0x98a/0xb30
>  [<c013395f>] ? do_fork+0x4f/0x280
>  [<c01573b3>] ? getnstimeofday+0x43/0x100
>  [<c010f770>] ? sys_clone+0x30/0x40
>  [<c06c048d>] ? ptregs_clone+0x15/0x48
>  [<c06bfb71>] ? syscall_call+0x7/0xb
> 
> The problem looks that in copy_page_range we turn lazy mode on, and then
> in swap_entry_free we call swap_count_continued which ends up in:
> 
>          map = kmap_atomic(page, KM_USER0) + offset;
> 
> and then later touches *map.
> 
> Since we are running in batched mode (lazy) we don't actually set up the
> PTE mappings and the kmap_atomic is not done synchronously and ends up
> trying to dereference a page that has not been set.
> 
> Looking at kmap_atomic_prot_pfn, it uses 'arch_flush_lazy_mmu_mode' and
> sprinkling that in kmap_atomic_prot and __kunmap_atomic makes the problem
> go away.
> 
> CC: Thomas Gleixner <tglx at linutronix.de>
> CC: Ingo Molnar <mingo at redhat.com>
> CC: "H. Peter Anvin" <hpa at zytor.com>
> CC: x86 at kernel.org
> CC: Peter Zijlstra <a.p.zijlstra at chello.nl>
> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge at citrix.com>
> CC: stable at kernel.org
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk at oracle.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/854050
> 
> (cherry-picked from ab67482036cee590753dd42b7f66aada97e6dcde linux-next)
> Signed-off-by: Stefan Bader <stefan.bader at canonical.com>
> ---
>  arch/x86/mm/highmem_32.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/x86/mm/highmem_32.c b/arch/x86/mm/highmem_32.c
> index b499626..f4f29b1 100644
> --- a/arch/x86/mm/highmem_32.c
> +++ b/arch/x86/mm/highmem_32.c
> @@ -45,6 +45,7 @@ void *kmap_atomic_prot(struct page *page, pgprot_t prot)
>  	vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
>  	BUG_ON(!pte_none(*(kmap_pte-idx)));
>  	set_pte(kmap_pte-idx, mk_pte(page, prot));
> +	arch_flush_lazy_mmu_mode();
>  
>  	return (void *)vaddr;
>  }
> @@ -88,6 +89,7 @@ void __kunmap_atomic(void *kvaddr)
>  		 */
>  		kpte_clear_flush(kmap_pte-idx, vaddr);
>  		kmap_atomic_idx_pop();
> +		arch_flush_lazy_mmu_mode();
>  	}
>  #ifdef CONFIG_DEBUG_HIGHMEM
>  	else {

Looks to do what is claimed, and it seems obvious that if your lazy mmu
mappings are not applied then you won't be able to use the mapping.
Seems to be going upstream and stable also.  Therefore:

Acked-by: Andy Whitcroft <apw at canonical.com>

I am also inclinded to say it look safe enough to me for all releases
and as you say may compromise a small amount of performance for
correctness.

-apw




More information about the kernel-team mailing list