ACK: [SRU T/X/A/B][C][PATCH 1/1] UBUNTU: SAUCE: CacheFiles: fix a read_waiter/read_copier race

Stefan Bader stefan.bader at canonical.com
Tue Jun 5 19:51:44 UTC 2018


On 03.06.2018 23:42, Daniel Axtens wrote:
> From: Lei Xue <carmark.dlut at gmail.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1774336
> 
> There is a potential race in fscache operation enqueuing for reading and
> copying multiple pages from cachefiles to netfs.
> 
> If this race occurs, an oops similar to the following is seen:
> 
> [585042.202316] FS-Cache:
> [585042.202343] FS-Cache: Assertion failed
> [585042.202367] FS-Cache: 6 == 5 is false
> [585042.202452] ------------[ cut here ]------------
> [585042.202480] kernel BUG at fs/fscache/operation.c:494!
> ...
> [585042.209600] Call Trace:
> [585042.211233]  [<ffffffffc034c29a>] fscache_op_work_func+0x2a/0x50 [fscache]
> [585042.212677]  [<ffffffff81095a70>] process_one_work+0x150/0x3f0
> [585042.213550]  [<ffffffff810961ea>] worker_thread+0x11a/0x470
> ...
> 
> The race occurs in the following situation:
> 
> One thread is in cachefiles_read_waiter:
>  1) object->work_lock is taken.
>  2) the operation is added to the to_do list.
>  3) the work lock is dropped.
>  4) fscache_enqueue_retrieval is called, which takes a reference.
> 
> Another thread is in cachefiles_read_copier:
>  1) object->work_lock is taken
>  2) an item is popped off the to_do list.
>  3) object->work_lock is dropped.
>  4) some processing is done on the item, and fscache_put_retrieval()
>     is called, dropping a reference.
> 
> Now if the this process in cachefiles_read_copier takes place
> *between* steps 3 and 4 in cachefiles_read_waiter, a reference will be
> dropped before it is taken, which leads to the object's reference count
> hitting zero, which leads to lifecycle events for the object happening
> too soon, leading to the assertion failure later on.
> 
> Move fscache_enqueue_retrieval under the lock in
> cachefiles_read_waiter. This means that the object cannot be popped
> off the to_do list until it is in a fully consistent state with the
> reference taken.
> 
> Signed-off-by: Lei Xue <carmark.dlut at gmail.com>
> Reviewed-by: Daniel Axtens <dja at axtens.net>
> [dja: rewrite and expand commit message]
> (From https://www.redhat.com/archives/linux-cachefs/2018-February/msg00000.html
>  This patch has been sitting on the mailing list for months with no
>  response from the maintainer. A similar patch fixing the same issue
>  was posted as far back as May 2017, and likewise had no response:
>  https://www.redhat.com/archives/linux-cachefs/2017-May/msg00002.html
>  I poked the list recently and also got nothing:
>  https://www.redhat.com/archives/linux-cachefs/2018-May/msg00000.html
>  and the problem was again reported and this patch validated by
>  another user:
>  https://www.redhat.com/archives/linux-cachefs/2018-May/msg00001.html
>  Hence the submission as a sauce patch.)
> Signed-off-by: Daniel Axtens <daniel.axtens at canonical.com>
Acked-by: Stefan Bader <stefan.bader at canonical.com>

> ---
>  fs/cachefiles/rdwr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
> index c0f3da3926a0..d9bb47d1e16d 100644
> --- a/fs/cachefiles/rdwr.c
> +++ b/fs/cachefiles/rdwr.c
> @@ -58,9 +58,9 @@ static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode,
>  
>  	spin_lock(&object->work_lock);
>  	list_add_tail(&monitor->op_link, &monitor->op->to_do);
> +	fscache_enqueue_retrieval(monitor->op);
>  	spin_unlock(&object->work_lock);
>  
> -	fscache_enqueue_retrieval(monitor->op);
>  	return 0;
>  }
>  
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20180605/99fcc2ad/attachment.sig>


More information about the kernel-team mailing list