[3.16.y-ckt stable] Patch "ext4: don't read blocks from disk after extents being swapped" has been added to the 3.16.y-ckt tree

Luis Henriques luis.henriques at canonical.com
Tue Mar 8 12:24:41 UTC 2016


This is a note to let you know that I have just added a patch titled

    ext4: don't read blocks from disk after extents being swapped

to the linux-3.16.y-queue branch of the 3.16.y-ckt extended stable tree 
which can be found at:

    http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.16.y-queue

This patch is scheduled to be released in version 3.16.7-ckt26.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.16.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Luis

---8<------------------------------------------------------------

>From ce6288ccdf8e0c8c24bf3fc0375608f37de19c7e Mon Sep 17 00:00:00 2001
From: Eryu Guan <guaneryu at gmail.com>
Date: Fri, 12 Feb 2016 01:20:43 -0500
Subject: ext4: don't read blocks from disk after extents being swapped

commit bcff24887d00bce102e0857d7b0a8c44a40f53d1 upstream.

I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.

The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned.  At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.

This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:

    mnt=/mnt/ext4
    donorfile=$mnt/donor
    testfile=$mnt/testfile
    e4compact=~/xfstests/src/e4compact

    rm -f $donorfile $testfile

    # reserve space for donor file, written by 0xaa and sync to disk to
    # avoid EBUSY on EXT4_IOC_MOVE_EXT
    xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile

    # create test file written by 0xbb
    xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile

    # compute initial md5sum
    md5sum $testfile | tee md5sum.txt
    # drop cache, force e4compact to read data from disk
    echo 3 > /proc/sys/vm/drop_caches

    # test defrag
    echo "$testfile" | $e4compact -i -v -f $donorfile
    # check md5sum
    md5sum -c md5sum.txt

Fix it by creating & mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().

Signed-off-by: Eryu Guan <guaneryu at gmail.com>
Signed-off-by: Theodore Ts'o <tytso at mit.edu>
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
---
 fs/ext4/move_extent.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 2484c7ec6a72..146f8cd627b2 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -915,10 +915,11 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
 	unsigned long blocksize = orig_inode->i_sb->s_blocksize;
 	unsigned int w_flags = 0;
 	unsigned int tmp_data_size, data_size, replaced_size;
-	int err2, jblocks, retries = 0;
+	int i, err2, jblocks, retries = 0;
 	int replaced_count = 0;
 	int from = data_offset_in_page << orig_inode->i_blkbits;
 	int blocks_per_page = PAGE_CACHE_SIZE >> orig_inode->i_blkbits;
+	struct buffer_head *bh = NULL;

 	/*
 	 * It needs twice the amount of ordinary journal buffers because
@@ -1027,8 +1028,16 @@ data_copy:
 	}
 	/* Perform all necessary steps similar write_begin()/write_end()
 	 * but keeping in mind that i_size will not change */
-	*err = __block_write_begin(pagep[0], from, replaced_size,
-				   ext4_get_block);
+	if (!page_has_buffers(pagep[0]))
+		create_empty_buffers(pagep[0], 1 << orig_inode->i_blkbits, 0);
+	bh = page_buffers(pagep[0]);
+	for (i = 0; i < data_offset_in_page; i++)
+		bh = bh->b_this_page;
+	for (i = 0; i < block_len_in_page; i++) {
+		*err = ext4_get_block(orig_inode, orig_blk_offset + i, bh, 0);
+		if (*err < 0)
+			break;
+	}
 	if (!*err)
 		*err = block_commit_write(pagep[0], from, from + replaced_size);





More information about the kernel-team mailing list