Fwd: [PATCH] writeback: Fix periodic writeback after fs mount

Tim Gardner tim.gardner at canonical.com
Thu May 30 12:23:29 UTC 2013


On 05/30/2013 03:44 AM, Bert De Jonghe wrote:
> Dear Sirs,
> 
> Although I'm sure you'll see the patch below pass by, just a quick mail
> to make extra sure as possible data loss is involved.
> 
> Problem is that, in certain cases, ext4 delayed allocation blocks are
> not flushed out to disk but remain in memory for quite a very long time.
> A power failure or reboot without cleanly unmounting the disks will thus
> result in data loss (possibly quite old data). We believe this is what
> happened at a customer site and it resulted in a number of zero length
> ~15 days old files after power outage.
> 
> We're using Ubuntu 12.04 LTS but it's certainly an upstream problem.
> 
> If you like, I can send a script to reproduce the issue.
> 
> Best regards,
> 
> Bert.
> 
> -------- Original Message --------
> Subject: 	[PATCH] writeback: Fix periodic writeback after fs mount
> Date: 	Thu, 30 May 2013 10:44:19 +0200
> From: 	Jan Kara <jack at suse.cz>
> To: 	Jens Axboe <axboe at kernel.dk>
> CC: 	Wu Fengguang <fengguang.wu at intel.com>,
> linux-fsdevel at vger.kernel.org, Bert De Jonghe
> <Bert.DeJonghe at amplidata.com>, Jan Kara <"jack at suse.cz>,
> stable"@vger.kernel.org#>
> 
> 
> 
> Code in blkdev.c moves a device inode to default_backing_dev_info when
> the last reference to the device is put and moves the device inode back
> to its bdi when the first reference is acquired. This includes moving to
> wb.b_dirty list if the device inode is dirty. The code however doesn't
> setup timer to wake corresponding flusher thread and while wb.b_dirty
> list is non-empty __mark_inode_dirty() will not set it up either. Thus
> periodic writeback is effectively disabled until a sync(2) call which can
> lead to unexpected data loss in case of crash or power failure.
> 
> Fix the problem by setting up a timer for periodic writeback in case we
> add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi().
> 
> Reported-by: Bert De Jonghe <Bert.DeJonghe at amplidata.com>
> CC: stable at vger.kernel.org # >= 3.0
> Signed-off-by: Jan Kara <jack at suse.cz>
> ---
>  fs/block_dev.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 2091db8..85f5c85 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -58,17 +58,24 @@ static void bdev_inode_switch_bdi(struct inode *inode,
>  			struct backing_dev_info *dst)
>  {
>  	struct backing_dev_info *old = inode->i_data.backing_dev_info;
> +	bool wakeup_bdi = false;
>  
>  	if (unlikely(dst == old))		/* deadlock avoidance */
>  		return;
>  	bdi_lock_two(&old->wb, &dst->wb);
>  	spin_lock(&inode->i_lock);
>  	inode->i_data.backing_dev_info = dst;
> -	if (inode->i_state & I_DIRTY)
> +	if (inode->i_state & I_DIRTY) {
> +		if (bdi_cap_writeback_dirty(dst) && !wb_has_dirty_io(&dst->wb))
> +			wakeup_bdi = true;
>  		list_move(&inode->i_wb_list, &dst->wb.b_dirty);
> +	}
>  	spin_unlock(&inode->i_lock);
>  	spin_unlock(&old->wb.list_lock);
>  	spin_unlock(&dst->wb.list_lock);
> +
> +	if (wakeup_bdi)
> +		bdi_wakeup_thread_delayed(dst);
>  }
>  
>  /* Kill _all_ buffers and pagecache , dirty or not.. */
> -- 
> 1.8.1.4
> 
> 
> 

Please let me know when this gets merged into Linus' repo. Given that
this patch is marked for stable, it will naturally get merged into
Ubuntu in due course.

rtg
-- 
Tim Gardner tim.gardner at canonical.com




More information about the kernel-team mailing list