[3.11.y.z extended stable] Patch "xfs: block allocation work needs to be kswapd aware" has been added to staging queue

Luis Henriques luis.henriques at canonical.com
Fri Jun 27 09:41:32 UTC 2014


Dave Chinner <david at fromorbit.com> writes:

> On Thu, Jun 26, 2014 at 11:36:31AM +0100, Luis Henriques wrote:
>> This is a note to let you know that I have just added a patch titled
>> 
>>     xfs: block allocation work needs to be kswapd aware
>> 
>> to the linux-3.11.y-queue branch of the 3.11.y.z extended stable tree 
>> which can be found at:
>> 
>>  http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.11.y-queue
>> 
>> If you, or anyone else, feels it should not be added to this tree, please 
>> reply to this email.
>> 
>> For more information about the 3.11.y.z tree, see
>> https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
>
> No, do not apply this commit to stable kernels - it is due to be
> reverted as is causes major memory reclaim behaviour regressions.
>
> http://oss.sgi.com/pipermail/xfs/2014-June/036851.html
>
> Cheers,
>
> Dave.
>

Ups, I wasn't aware of that.  Thanks a lot, I'll drop this patch from
the 3.11 queue.

Cheers,
-- 
Luís


>> 
>> Thanks.
>> -Luis
>> 
>> ------
>> 
>> From 979242514e5dbb38d8c68e91bb7adcf9cffce846 Mon Sep 17 00:00:00 2001
>> From: Dave Chinner <dchinner at redhat.com>
>> Date: Fri, 6 Jun 2014 15:59:59 +1000
>> Subject: xfs: block allocation work needs to be kswapd aware
>> 
>> commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679 upstream.
>> 
>> Upon memory pressure, kswapd calls xfs_vm_writepage() from
>> shrink_page_list(). This can result in delayed allocation occurring
>> and that gets deferred to the the allocation workqueue.
>> 
>> The allocation then runs outside kswapd context, which means if it
>> needs memory (and it does to demand page metadata from disk) it can
>> block in shrink_inactive_list() waiting for IO congestion. These
>> blocking waits are normally avoiding in kswapd context, so under
>> memory pressure writeback from kswapd can be arbitrarily delayed by
>> memory reclaim.
>> 
>> To avoid this, pass the kswapd context to the allocation being done
>> by the workqueue, so that memory reclaim understands correctly that
>> the work is being done for kswapd and therefore it is not blocked
>> and does not delay memory reclaim.
>> 
>> To avoid issues with int->char conversion of flag fields (as noticed
>> in v1 of this patch) convert the flag fields in the struct
>> xfs_bmalloca to bool types. pahole indicates these variables are
>> still single byte variables, so no extra space is consumed by this
>> change.
>> 
>> Reported-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp>
>> Signed-off-by: Dave Chinner <dchinner at redhat.com>
>> Reviewed-by: Christoph Hellwig <hch at lst.de>
>> Signed-off-by: Dave Chinner <david at fromorbit.com>
>> [ luis: backported to 3.11: files rename:
>>   - fs/xfs/xfs_bmap_util.c -> fs/xfs/xfs_bmap.c
>>   - fs/xfs/xfs_bmap_util.h -> fs/xfs/xfs_bmap.h ]
>> Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
>> ---
>>  fs/xfs/xfs_bmap.c | 16 +++++++++++++---
>>  fs/xfs/xfs_bmap.h | 13 +++++++------
>>  2 files changed, 20 insertions(+), 9 deletions(-)
>> 
>> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
>> index 05c698ccb238..ed6dc2d02573 100644
>> --- a/fs/xfs/xfs_bmap.c
>> +++ b/fs/xfs/xfs_bmap.c
>> @@ -4763,14 +4763,23 @@ xfs_bmapi_allocate_worker(
>>  	struct xfs_bmalloca	*args = container_of(work,
>>  						struct xfs_bmalloca, work);
>>  	unsigned long		pflags;
>> +	unsigned long		new_pflags = PF_FSTRANS;
>> 
>> -	/* we are in a transaction context here */
>> -	current_set_flags_nested(&pflags, PF_FSTRANS);
>> +	/*
>> +	 * we are in a transaction context here, but may also be doing work
>> +	 * in kswapd context, and hence we may need to inherit that state
>> +	 * temporarily to ensure that we don't block waiting for memory reclaim
>> +	 * in any way.
>> +	 */
>> +	if (args->kswapd)
>> +		new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
>> +
>> +	current_set_flags_nested(&pflags, new_pflags);
>> 
>>  	args->result = __xfs_bmapi_allocate(args);
>>  	complete(args->done);
>> 
>> -	current_restore_flags_nested(&pflags, PF_FSTRANS);
>> +	current_restore_flags_nested(&pflags, new_pflags);
>>  }
>> 
>>  /*
>> @@ -4789,6 +4798,7 @@ xfs_bmapi_allocate(
>> 
>> 
>>  	args->done = &done;
>> +	args->kswapd = current_is_kswapd();
>>  	INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
>>  	queue_work(xfs_alloc_wq, &args->work);
>>  	wait_for_completion(&done);
>> diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
>> index 1cf1292d29b7..035afc71a9ca 100644
>> --- a/fs/xfs/xfs_bmap.h
>> +++ b/fs/xfs/xfs_bmap.h
>> @@ -130,12 +130,13 @@ typedef struct xfs_bmalloca {
>>  	xfs_extlen_t		total;	/* total blocks needed for xaction */
>>  	xfs_extlen_t		minlen;	/* minimum allocation size (blocks) */
>>  	xfs_extlen_t		minleft; /* amount must be left after alloc */
>> -	char			eof;	/* set if allocating past last extent */
>> -	char			wasdel;	/* replacing a delayed allocation */
>> -	char			userdata;/* set if is user data */
>> -	char			aeof;	/* allocated space at eof */
>> -	char			conv;	/* overwriting unwritten extents */
>> -	char			stack_switch;
>> +	bool			eof;	/* set if allocating past last extent */
>> +	bool			wasdel;	/* replacing a delayed allocation */
>> +	bool			userdata;/* set if is user data */
>> +	bool			aeof;	/* allocated space at eof */
>> +	bool			conv;	/* overwriting unwritten extents */
>> +	bool			stack_switch;
>> +	bool			kswapd;	/* allocation in kswapd context */
>>  	int			flags;
>>  	struct completion	*done;
>>  	struct work_struct	work;
>> --
>> 1.9.1
>> 
>> 




More information about the kernel-team mailing list