[SRU Bionic] mm/mremap: hold the rmap lock in write mode when moving page table entries.
Cengiz Can
cengiz.can at canonical.com
Tue Nov 22 06:19:12 UTC 2022
From: "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com>
commit 97113eb39fa7972722ff490b947d8af023e1f6a2 upstream.
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables(). The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken. This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.
As explained in commit eb66ae030829 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page. The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock. This can result in stale TLB entries as show
below.
This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::
Optmized PMD move
CPU 1 CPU 2 CPU 3
mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one
mmap_write_lock_killable()
addr = old_addr
lock(pte_ptl)
lock(pmd_ptl)
pmd = *old_pmd
pmd_clear(old_pmd)
flush_tlb_range(old_addr)
*new_pmd = pmd
*new_addr = 10; and fills
TLB with new addr
and old pfn
unlock(pmd_ptl)
ptep_clear_flush()
old pfn is free.
Stale TLB entry
Optimized PUD move also suffers from a similar race. Both the above race
condition can be fixed if we force mremap path to take rmap lock.
Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP3S-huwbPw@mail.gmail.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
Acked-by: Hugh Dickins <hughd at google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
Cc: Christophe Leroy <christophe.leroy at csgroup.eu>
Cc: Joel Fernandes <joel at joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh at google.com>
Cc: Kirill A. Shutemov <kirill at shutemov.name>
Cc: Michael Ellerman <mpe at ellerman.id.au>
Cc: Nicholas Piggin <npiggin at gmail.com>
Cc: Stephen Rothwell <sfr at canb.auug.org.au>
Cc: <stable at vger.kernel.org>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
[patch rewritten for backport since the code was refactored since]
Signed-off-by: Jann Horn <jannh at google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
CVE-2022-41222
(backported from commit 79e522101cf40735f1936a10312e17f937b8dcad linux-5.4.y)
[cengizcan: adapt context]
Signed-off-by: Cengiz Can <cengiz.can at canonical.com>
---
mm/mremap.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 2bdb255cde9a9..473cf0e4c5f13 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -230,12 +230,10 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
if (extent == HPAGE_PMD_SIZE) {
bool moved;
/* See comment in move_ptes() */
- if (need_rmap_locks)
- take_rmap_locks(vma);
+ take_rmap_locks(vma);
moved = move_huge_pmd(vma, old_addr, new_addr,
old_end, old_pmd, new_pmd);
- if (need_rmap_locks)
- drop_rmap_locks(vma);
+ drop_rmap_locks(vma);
if (moved)
continue;
}
--
2.37.2
More information about the kernel-team
mailing list