[3.11.y.z extended stable] Patch "xfs: block allocation work needs to be kswapd aware" has been added to staging queue
Luis Henriques
luis.henriques at canonical.com
Fri Jun 27 09:41:32 UTC 2014
Dave Chinner <david at fromorbit.com> writes:
> On Thu, Jun 26, 2014 at 11:36:31AM +0100, Luis Henriques wrote:
>> This is a note to let you know that I have just added a patch titled
>>
>> xfs: block allocation work needs to be kswapd aware
>>
>> to the linux-3.11.y-queue branch of the 3.11.y.z extended stable tree
>> which can be found at:
>>
>> http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.11.y-queue
>>
>> If you, or anyone else, feels it should not be added to this tree, please
>> reply to this email.
>>
>> For more information about the 3.11.y.z tree, see
>> https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
>
> No, do not apply this commit to stable kernels - it is due to be
> reverted as is causes major memory reclaim behaviour regressions.
>
> http://oss.sgi.com/pipermail/xfs/2014-June/036851.html
>
> Cheers,
>
> Dave.
>
Ups, I wasn't aware of that. Thanks a lot, I'll drop this patch from
the 3.11 queue.
Cheers,
--
Luís
>>
>> Thanks.
>> -Luis
>>
>> ------
>>
>> From 979242514e5dbb38d8c68e91bb7adcf9cffce846 Mon Sep 17 00:00:00 2001
>> From: Dave Chinner <dchinner at redhat.com>
>> Date: Fri, 6 Jun 2014 15:59:59 +1000
>> Subject: xfs: block allocation work needs to be kswapd aware
>>
>> commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679 upstream.
>>
>> Upon memory pressure, kswapd calls xfs_vm_writepage() from
>> shrink_page_list(). This can result in delayed allocation occurring
>> and that gets deferred to the the allocation workqueue.
>>
>> The allocation then runs outside kswapd context, which means if it
>> needs memory (and it does to demand page metadata from disk) it can
>> block in shrink_inactive_list() waiting for IO congestion. These
>> blocking waits are normally avoiding in kswapd context, so under
>> memory pressure writeback from kswapd can be arbitrarily delayed by
>> memory reclaim.
>>
>> To avoid this, pass the kswapd context to the allocation being done
>> by the workqueue, so that memory reclaim understands correctly that
>> the work is being done for kswapd and therefore it is not blocked
>> and does not delay memory reclaim.
>>
>> To avoid issues with int->char conversion of flag fields (as noticed
>> in v1 of this patch) convert the flag fields in the struct
>> xfs_bmalloca to bool types. pahole indicates these variables are
>> still single byte variables, so no extra space is consumed by this
>> change.
>>
>> Reported-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp>
>> Signed-off-by: Dave Chinner <dchinner at redhat.com>
>> Reviewed-by: Christoph Hellwig <hch at lst.de>
>> Signed-off-by: Dave Chinner <david at fromorbit.com>
>> [ luis: backported to 3.11: files rename:
>> - fs/xfs/xfs_bmap_util.c -> fs/xfs/xfs_bmap.c
>> - fs/xfs/xfs_bmap_util.h -> fs/xfs/xfs_bmap.h ]
>> Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
>> ---
>> fs/xfs/xfs_bmap.c | 16 +++++++++++++---
>> fs/xfs/xfs_bmap.h | 13 +++++++------
>> 2 files changed, 20 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
>> index 05c698ccb238..ed6dc2d02573 100644
>> --- a/fs/xfs/xfs_bmap.c
>> +++ b/fs/xfs/xfs_bmap.c
>> @@ -4763,14 +4763,23 @@ xfs_bmapi_allocate_worker(
>> struct xfs_bmalloca *args = container_of(work,
>> struct xfs_bmalloca, work);
>> unsigned long pflags;
>> + unsigned long new_pflags = PF_FSTRANS;
>>
>> - /* we are in a transaction context here */
>> - current_set_flags_nested(&pflags, PF_FSTRANS);
>> + /*
>> + * we are in a transaction context here, but may also be doing work
>> + * in kswapd context, and hence we may need to inherit that state
>> + * temporarily to ensure that we don't block waiting for memory reclaim
>> + * in any way.
>> + */
>> + if (args->kswapd)
>> + new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
>> +
>> + current_set_flags_nested(&pflags, new_pflags);
>>
>> args->result = __xfs_bmapi_allocate(args);
>> complete(args->done);
>>
>> - current_restore_flags_nested(&pflags, PF_FSTRANS);
>> + current_restore_flags_nested(&pflags, new_pflags);
>> }
>>
>> /*
>> @@ -4789,6 +4798,7 @@ xfs_bmapi_allocate(
>>
>>
>> args->done = &done;
>> + args->kswapd = current_is_kswapd();
>> INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
>> queue_work(xfs_alloc_wq, &args->work);
>> wait_for_completion(&done);
>> diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
>> index 1cf1292d29b7..035afc71a9ca 100644
>> --- a/fs/xfs/xfs_bmap.h
>> +++ b/fs/xfs/xfs_bmap.h
>> @@ -130,12 +130,13 @@ typedef struct xfs_bmalloca {
>> xfs_extlen_t total; /* total blocks needed for xaction */
>> xfs_extlen_t minlen; /* minimum allocation size (blocks) */
>> xfs_extlen_t minleft; /* amount must be left after alloc */
>> - char eof; /* set if allocating past last extent */
>> - char wasdel; /* replacing a delayed allocation */
>> - char userdata;/* set if is user data */
>> - char aeof; /* allocated space at eof */
>> - char conv; /* overwriting unwritten extents */
>> - char stack_switch;
>> + bool eof; /* set if allocating past last extent */
>> + bool wasdel; /* replacing a delayed allocation */
>> + bool userdata;/* set if is user data */
>> + bool aeof; /* allocated space at eof */
>> + bool conv; /* overwriting unwritten extents */
>> + bool stack_switch;
>> + bool kswapd; /* allocation in kswapd context */
>> int flags;
>> struct completion *done;
>> struct work_struct work;
>> --
>> 1.9.1
>>
>>
More information about the kernel-team
mailing list