[3.13.y.z extended stable] Patch "mm: use paravirt friendly ops for NUMA hinting ptes" has been added to staging queue

Kamal Mostafa kamal at canonical.com
Fri May 9 20:34:06 UTC 2014

This is a note to let you know that I have just added a patch titled

    mm: use paravirt friendly ops for NUMA hinting ptes

to the linux-3.13.y-queue branch of the 3.13.y.z extended stable tree 
which can be found at:


This patch is scheduled to be released in version

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.13.y.z tree, see



>From dec1d8e2ff4765d17f3e97850736c56f5812b860 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mgorman at suse.de>
Date: Fri, 18 Apr 2014 15:07:21 -0700
Subject: mm: use paravirt friendly ops for NUMA hinting ptes

commit 29c7787075c92ca8af353acd5301481e6f37082f upstream.

David Vrabel identified a regression when using automatic NUMA balancing
under Xen whereby page table entries were getting corrupted due to the
use of native PTE operations.  Quoting him

	Xen PV guest page tables require that their entries use machine
	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
	successful migration) non-present PTEs must use pseudo-physical
	addresses.  This is because on migration MFNs in present PTEs are
	translated to PFNs (canonicalised) so they may be translated back
	to the new MFN in the destination domain (uncanonicalised).

	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
	pte_clear_flags(), etc.

	In a Xen PV guest, these functions must translate MFNs to PFNs
	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting

His suggested fix converted p[te|md]_[set|clear]_flags to using
paravirt-friendly ops but this is overkill.  He suggested an alternative
of using p[te|md]_modify in the NUMA page table operations but this is
does more work than necessary and would require looking up a VMA for

This patch modifies the NUMA page table operations to use paravirt
friendly operations to set/clear the flags of interest.  Unfortunately
this will take a performance hit when updating the PTEs on
CONFIG_PARAVIRT but I do not see a way around it that does not break

Signed-off-by: Mel Gorman <mgorman at suse.de>
Acked-by: David Vrabel <david.vrabel at citrix.com>
Tested-by: David Vrabel <david.vrabel at citrix.com>
Cc: Ingo Molnar <mingo at kernel.org>
Cc: Peter Anvin <hpa at zytor.com>
Cc: Fengguang Wu <fengguang.wu at intel.com>
Cc: Linus Torvalds <torvalds at linux-foundation.org>
Cc: Steven Noonan <steven at uplinklabs.net>
Cc: Rik van Riel <riel at redhat.com>
Cc: Peter Zijlstra <peterz at infradead.org>
Cc: Andrea Arcangeli <aarcange at redhat.com>
Cc: Dave Hansen <dave.hansen at intel.com>
Cc: Srikar Dronamraju <srikar at linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov at gmail.com>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
BugLink: http://bugs.launchpad.net/bugs/1313450
Cc: Stefan Bader <stefan.bader at canonical.com>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
 include/asm-generic/pgtable.h | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 8e4f41d..eaa0a65 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -680,32 +680,47 @@ static inline int pmd_numa(pmd_t pmd)
 #ifndef pte_mknonnuma
 static inline pte_t pte_mknonnuma(pte_t pte)
-	pte = pte_clear_flags(pte, _PAGE_NUMA);
-	return pte_set_flags(pte, _PAGE_PRESENT|_PAGE_ACCESSED);
+	pteval_t val = pte_val(pte);
+	val &= ~_PAGE_NUMA;
+	return __pte(val);

 #ifndef pmd_mknonnuma
 static inline pmd_t pmd_mknonnuma(pmd_t pmd)
-	pmd = pmd_clear_flags(pmd, _PAGE_NUMA);
-	return pmd_set_flags(pmd, _PAGE_PRESENT|_PAGE_ACCESSED);
+	pmdval_t val = pmd_val(pmd);
+	val &= ~_PAGE_NUMA;
+	return __pmd(val);

 #ifndef pte_mknuma
 static inline pte_t pte_mknuma(pte_t pte)
-	pte = pte_set_flags(pte, _PAGE_NUMA);
-	return pte_clear_flags(pte, _PAGE_PRESENT);
+	pteval_t val = pte_val(pte);
+	val &= ~_PAGE_PRESENT;
+	val |= _PAGE_NUMA;
+	return __pte(val);

 #ifndef pmd_mknuma
 static inline pmd_t pmd_mknuma(pmd_t pmd)
-	pmd = pmd_set_flags(pmd, _PAGE_NUMA);
-	return pmd_clear_flags(pmd, _PAGE_PRESENT);
+	pmdval_t val = pmd_val(pmd);
+	val &= ~_PAGE_PRESENT;
+	val |= _PAGE_NUMA;
+	return __pmd(val);

More information about the kernel-team mailing list