[3.8.y.z extended stable] Patch "mm, thp: close race between mremap() and split_huge_page()" has been added to staging queue

Kamal Mostafa kamal at canonical.com
Mon Jun 23 21:17:31 UTC 2014


This is a note to let you know that I have just added a patch titled

    mm, thp: close race between mremap() and split_huge_page()

to the linux-3.8.y-queue branch of the 3.8.y.z extended stable tree 
which can be found at:

 http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.8.y-queue

This patch is scheduled to be released in version 3.8.13.25.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.8.y.z tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

------

>From a9a74bfe0fc256e2a8b3266ccbe9861316cd8c84 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov at linux.intel.com>
Date: Fri, 9 May 2014 15:37:00 -0700
Subject: [PATCH 12/66] mm, thp: close race between mremap() and
 split_huge_page()

commit dd18dbc2d42af75fffa60c77e0f02220bc329829 upstream.

It's critical for split_huge_page() (and migration) to catch and freeze
all PMDs on rmap walk.  It gets tricky if there's concurrent fork() or
mremap() since usually we copy/move page table entries on dup_mm() or
move_page_tables() without rmap lock taken.  To get it work we rely on
rmap walk order to not miss any entry.  We expect to see destination VMA
after source one to work correctly.

But after switching rmap implementation to interval tree it's not always
possible to preserve expected walk order.

It works fine for dup_mm() since new VMA has the same vma_start_pgoff()
/ vma_last_pgoff() and explicitly insert dst VMA after src one with
vma_interval_tree_insert_after().

But on move_vma() destination VMA can be merged into adjacent one and as
result shifted left in interval tree.  Fortunately, we can detect the
situation and prevent race with rmap walk by moving page table entries
under rmap lock.  See commit 38a76013ad80.

Problem is that we miss the lock when we move transhuge PMD.  Most
likely this bug caused the crash[1].

[1] http://thread.gmane.org/gmane.linux.kernel.mm/96473

Fixes: 108d6642ad81 ("mm anon rmap: remove anon_vma_moveto_tail")

Signed-off-by: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
Reviewed-by: Andrea Arcangeli <aarcange at redhat.com>
Cc: Rik van Riel <riel at redhat.com>
Acked-by: Michel Lespinasse <walken at google.com>
Cc: Dave Jones <davej at redhat.com>
Cc: David Miller <davem at davemloft.net>
Acked-by: Johannes Weiner <hannes at cmpxchg.org>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
 mm/mremap.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mm/mremap.c b/mm/mremap.c
index 7b26643..e251eaf 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -174,10 +174,17 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
 			break;
 		if (pmd_trans_huge(*old_pmd)) {
 			int err = 0;
-			if (extent == HPAGE_PMD_SIZE)
+			if (extent == HPAGE_PMD_SIZE) {
+				VM_BUG_ON(vma->vm_file || !vma->anon_vma);
+				/* See comment in move_ptes() */
+				if (need_rmap_locks)
+					anon_vma_lock_write(vma->anon_vma);
 				err = move_huge_pmd(vma, new_vma, old_addr,
 						    new_addr, old_end,
 						    old_pmd, new_pmd);
+				if (need_rmap_locks)
+					anon_vma_unlock_write(vma->anon_vma);
+			}
 			if (err > 0) {
 				need_flush = true;
 				continue;
--
1.9.1





More information about the kernel-team mailing list