Fwd: [PATCH] writeback: Fix periodic writeback after fs mount
Tim Gardner
tim.gardner at canonical.com
Thu May 30 12:23:29 UTC 2013
On 05/30/2013 03:44 AM, Bert De Jonghe wrote:
> Dear Sirs,
>
> Although I'm sure you'll see the patch below pass by, just a quick mail
> to make extra sure as possible data loss is involved.
>
> Problem is that, in certain cases, ext4 delayed allocation blocks are
> not flushed out to disk but remain in memory for quite a very long time.
> A power failure or reboot without cleanly unmounting the disks will thus
> result in data loss (possibly quite old data). We believe this is what
> happened at a customer site and it resulted in a number of zero length
> ~15 days old files after power outage.
>
> We're using Ubuntu 12.04 LTS but it's certainly an upstream problem.
>
> If you like, I can send a script to reproduce the issue.
>
> Best regards,
>
> Bert.
>
> -------- Original Message --------
> Subject: [PATCH] writeback: Fix periodic writeback after fs mount
> Date: Thu, 30 May 2013 10:44:19 +0200
> From: Jan Kara <jack at suse.cz>
> To: Jens Axboe <axboe at kernel.dk>
> CC: Wu Fengguang <fengguang.wu at intel.com>,
> linux-fsdevel at vger.kernel.org, Bert De Jonghe
> <Bert.DeJonghe at amplidata.com>, Jan Kara <"jack at suse.cz>,
> stable"@vger.kernel.org#>
>
>
>
> Code in blkdev.c moves a device inode to default_backing_dev_info when
> the last reference to the device is put and moves the device inode back
> to its bdi when the first reference is acquired. This includes moving to
> wb.b_dirty list if the device inode is dirty. The code however doesn't
> setup timer to wake corresponding flusher thread and while wb.b_dirty
> list is non-empty __mark_inode_dirty() will not set it up either. Thus
> periodic writeback is effectively disabled until a sync(2) call which can
> lead to unexpected data loss in case of crash or power failure.
>
> Fix the problem by setting up a timer for periodic writeback in case we
> add the first dirty inode to wb.b_dirty list in bdev_inode_switch_bdi().
>
> Reported-by: Bert De Jonghe <Bert.DeJonghe at amplidata.com>
> CC: stable at vger.kernel.org # >= 3.0
> Signed-off-by: Jan Kara <jack at suse.cz>
> ---
> fs/block_dev.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 2091db8..85f5c85 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -58,17 +58,24 @@ static void bdev_inode_switch_bdi(struct inode *inode,
> struct backing_dev_info *dst)
> {
> struct backing_dev_info *old = inode->i_data.backing_dev_info;
> + bool wakeup_bdi = false;
>
> if (unlikely(dst == old)) /* deadlock avoidance */
> return;
> bdi_lock_two(&old->wb, &dst->wb);
> spin_lock(&inode->i_lock);
> inode->i_data.backing_dev_info = dst;
> - if (inode->i_state & I_DIRTY)
> + if (inode->i_state & I_DIRTY) {
> + if (bdi_cap_writeback_dirty(dst) && !wb_has_dirty_io(&dst->wb))
> + wakeup_bdi = true;
> list_move(&inode->i_wb_list, &dst->wb.b_dirty);
> + }
> spin_unlock(&inode->i_lock);
> spin_unlock(&old->wb.list_lock);
> spin_unlock(&dst->wb.list_lock);
> +
> + if (wakeup_bdi)
> + bdi_wakeup_thread_delayed(dst);
> }
>
> /* Kill _all_ buffers and pagecache , dirty or not.. */
> --
> 1.8.1.4
>
>
>
Please let me know when this gets merged into Linus' repo. Given that
this patch is marked for stable, it will naturally get merged into
Ubuntu in due course.
rtg
--
Tim Gardner tim.gardner at canonical.com
More information about the kernel-team
mailing list