[PATCH 4.2.y-ckt 2/5] mm: migrate dirty page without clear_page_dirty_for_io etc

Kamal Mostafa kamal at canonical.com
Fri Jul 8 17:07:01 UTC 2016

4.2.8-ckt13 -stable review patch.  If anyone has any objections, please let me know.


From: Hugh Dickins <hughd at google.com>

commit 42cb14b110a5698ccf26ce59c4441722605a3743 upstream.

clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
since v2.6.16 first introduced page migration; and the set_page_dirty()
which completed its migration of PageDirty, later had to be moderated to
__set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.

No actual problems seen with this procedure recently, but if you look into
what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
achieving, it turns out to be nothing more than moving the PageDirty flag,
and its NR_FILE_DIRTY stat from one zone to another.

It would be good to avoid a pile of irrelevant decrementations and
incrementations, and improper event counting, and unnecessary descent of
the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
radix_tree_replace_slot() left in place anyway).

Do the NR_FILE_DIRTY movement, like the other stats movements, while
interrupts still disabled in migrate_page_move_mapping(); and don't even
bother if the zone is the same.  Do the PageDirty movement there under
tree_lock too, where old page is frozen and newpage not yet visible:
bearing in mind that as soon as newpage becomes visible in radix_tree, an
un-page-locked set_page_dirty() might interfere (or perhaps that's just
not possible: anything doing so should already hold an additional
reference to the old page, preventing its migration; but play safe).

But we do still need to transfer PageDirty in migrate_page_copy(), for
those who don't go the mapping route through migrate_page_move_mapping().

Signed-off-by: Hugh Dickins <hughd at google.com>
Cc: Christoph Lameter <cl at linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov at linux.intel.com>
Cc: Rik van Riel <riel at redhat.com>
Cc: Vlastimil Babka <vbabka at suse.cz>
Cc: Davidlohr Bueso <dave at stgolabs.net>
Cc: Oleg Nesterov <oleg at redhat.com>
Cc: Sasha Levin <sasha.levin at oracle.com>
Cc: Dmitry Vyukov <dvyukov at google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro at jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
[bwh: Backported to 3.16: adjust context.  This is not just an optimisation,
 but turned out to fix a possible oops (CVE-2016-3070).]
Signed-off-by: Ben Hutchings <ben at decadent.org.uk>
Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
 mm/migrate.c | 51 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fcb6204..a14784c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -30,6 +30,7 @@
 #include <linux/mempolicy.h>
 #include <linux/vmalloc.h>
 #include <linux/security.h>
+#include <linux/backing-dev.h>
 #include <linux/memcontrol.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
@@ -310,6 +311,8 @@ int migrate_page_move_mapping(struct address_space *mapping,
 		struct buffer_head *head, enum migrate_mode mode,
 		int extra_count)
+	struct zone *oldzone, *newzone;
+	int dirty;
 	int expected_count = 1 + extra_count;
 	void **pslot;
@@ -320,6 +323,9 @@ int migrate_page_move_mapping(struct address_space *mapping,
+	oldzone = page_zone(page);
+	newzone = page_zone(newpage);
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
@@ -360,6 +366,13 @@ int migrate_page_move_mapping(struct address_space *mapping,
 		set_page_private(newpage, page_private(page));
+	/* Move dirty while page refs frozen and newpage not yet exposed */
+	dirty = PageDirty(page);
+	if (dirty) {
+		ClearPageDirty(page);
+		SetPageDirty(newpage);
+	}
 	radix_tree_replace_slot(pslot, newpage);
@@ -369,6 +382,9 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	page_unfreeze_refs(page, expected_count - 1);
+	spin_unlock(&mapping->tree_lock);
+	/* Leave irq disabled to prevent preemption while updating stats */
 	 * If moved to a different zone then also account
 	 * the page for that zone. Other VM counters will be
@@ -379,13 +395,19 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	 * via NR_FILE_PAGES and NR_ANON_PAGES if they
 	 * are mapped to swap space.
-	__dec_zone_page_state(page, NR_FILE_PAGES);
-	__inc_zone_page_state(newpage, NR_FILE_PAGES);
-	if (!PageSwapCache(page) && PageSwapBacked(page)) {
-		__dec_zone_page_state(page, NR_SHMEM);
-		__inc_zone_page_state(newpage, NR_SHMEM);
+	if (newzone != oldzone) {
+		__dec_zone_state(oldzone, NR_FILE_PAGES);
+		__inc_zone_state(newzone, NR_FILE_PAGES);
+		if (PageSwapBacked(page) && !PageSwapCache(page)) {
+			__dec_zone_state(oldzone, NR_SHMEM);
+			__inc_zone_state(newzone, NR_SHMEM);
+		}
+		if (dirty && mapping_cap_account_dirty(mapping)) {
+			__dec_zone_state(oldzone, NR_FILE_DIRTY);
+			__inc_zone_state(newzone, NR_FILE_DIRTY);
+		}
-	spin_unlock_irq(&mapping->tree_lock);
+	local_irq_enable();
@@ -509,20 +531,9 @@ void migrate_page_copy(struct page *newpage, struct page *page)
 	if (PageMappedToDisk(page))
-	if (PageDirty(page)) {
-		clear_page_dirty_for_io(page);
-		/*
-		 * Want to mark the page and the radix tree as dirty, and
-		 * redo the accounting that clear_page_dirty_for_io undid,
-		 * but we can't use set_page_dirty because that function
-		 * is actually a signal that all of the page has become dirty.
-		 * Whereas only part of our page may be dirty.
-		 */
-		if (PageSwapBacked(page))
-			SetPageDirty(newpage);
-		else
-			__set_page_dirty_nobuffers(newpage);
- 	}
+	/* Move dirty on pages not done by migrate_page_move_mapping() */
+	if (PageDirty(page))
+		SetPageDirty(newpage);
 	 * Copy NUMA information to the new page, to prevent over-eager

More information about the kernel-team mailing list