[PATCH 4.2.y-ckt 180/268] zsmalloc: fix migrate_zspage-zs_free race condition

Kamal Mostafa kamal at canonical.com
Wed Jan 27 20:34:01 UTC 2016


4.2.8-ckt3 -stable review patch.  If anyone has any objections, please let me know.

---8<------------------------------------------------------------

From: Junil Lee <junil0814.lee at lge.com>

commit c102f07ca0b04f2cb49cfc161c83f6239d17f491 upstream.

record_obj() in migrate_zspage() does not preserve handle's
HANDLE_PIN_BIT, set by find_aloced_obj()->trypin_tag(), and implicitly
(accidentally) un-pins the handle, while migrate_zspage() still performs
an explicit unpin_tag() on the that handle.  This additional explicit
unpin_tag() introduces a race condition with zs_free(), which can pin
that handle by this time, so the handle becomes un-pinned.

Schematically, it goes like this:

  CPU0                                        CPU1
  migrate_zspage
    find_alloced_obj
      trypin_tag
        set HANDLE_PIN_BIT                    zs_free()
                                                pin_tag()
  obj_malloc() -- new object, no tag
  record_obj() -- remove HANDLE_PIN_BIT           set HANDLE_PIN_BIT
  unpin_tag()  -- remove zs_free's HANDLE_PIN_BIT

The race condition may result in a NULL pointer dereference:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  CPU: 0 PID: 19001 Comm: CookieMonsterCl Tainted:
  PC is at get_zspage_mapping+0x0/0x24
  LR is at obj_free.isra.22+0x64/0x128
  Call trace:
     get_zspage_mapping+0x0/0x24
     zs_free+0x88/0x114
     zram_free_page+0x64/0xcc
     zram_slot_free_notify+0x90/0x108
     swap_entry_free+0x278/0x294
     free_swap_and_cache+0x38/0x11c
     unmap_single_vma+0x480/0x5c8
     unmap_vmas+0x44/0x60
     exit_mmap+0x50/0x110
     mmput+0x58/0xe0
     do_exit+0x320/0x8dc
     do_group_exit+0x44/0xa8
     get_signal+0x538/0x580
     do_signal+0x98/0x4b8
     do_notify_resume+0x14/0x5c

This patch keeps the lock bit in migration path and update value
atomically.

Signed-off-by: Junil Lee <junil0814.lee at lge.com>
Signed-off-by: Minchan Kim <minchan at kernel.org>
Acked-by: Vlastimil Babka <vbabka at suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work at gmail.com>
Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
 mm/zsmalloc.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 0a7f81a..88fbc8b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -302,7 +302,12 @@ static void free_handle(struct zs_pool *pool, unsigned long handle)
 
 static void record_obj(unsigned long handle, unsigned long obj)
 {
-	*(unsigned long *)handle = obj;
+	/*
+	 * lsb of @obj represents handle lock while other bits
+	 * represent object value the handle is pointing so
+	 * updating shouldn't do store tearing.
+	 */
+	WRITE_ONCE(*(unsigned long *)handle, obj);
 }
 
 /* zpool driver */
@@ -1638,6 +1643,13 @@ static int migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		free_obj = obj_malloc(d_page, class, handle);
 		zs_object_copy(used_obj, free_obj, class);
 		index++;
+		/*
+		 * record_obj updates handle's value to free_obj and it will
+		 * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which
+		 * breaks synchronization using pin_tag(e,g, zs_free) so
+		 * let's keep the lock bit.
+		 */
+		free_obj |= BIT(HANDLE_PIN_BIT);
 		record_obj(handle, free_obj);
 		unpin_tag(handle);
 		obj_free(pool, class, used_obj);
-- 
1.9.1





More information about the kernel-team mailing list