ACK: [SRU T/X/A/B][C][PATCH 1/1] UBUNTU: SAUCE: CacheFiles: fix a read_waiter/read_copier race
Stefan Bader
stefan.bader at canonical.com
Tue Jun 5 19:51:44 UTC 2018
On 03.06.2018 23:42, Daniel Axtens wrote:
> From: Lei Xue <carmark.dlut at gmail.com>
>
> BugLink: https://bugs.launchpad.net/bugs/1774336
>
> There is a potential race in fscache operation enqueuing for reading and
> copying multiple pages from cachefiles to netfs.
>
> If this race occurs, an oops similar to the following is seen:
>
> [585042.202316] FS-Cache:
> [585042.202343] FS-Cache: Assertion failed
> [585042.202367] FS-Cache: 6 == 5 is false
> [585042.202452] ------------[ cut here ]------------
> [585042.202480] kernel BUG at fs/fscache/operation.c:494!
> ...
> [585042.209600] Call Trace:
> [585042.211233] [<ffffffffc034c29a>] fscache_op_work_func+0x2a/0x50 [fscache]
> [585042.212677] [<ffffffff81095a70>] process_one_work+0x150/0x3f0
> [585042.213550] [<ffffffff810961ea>] worker_thread+0x11a/0x470
> ...
>
> The race occurs in the following situation:
>
> One thread is in cachefiles_read_waiter:
> 1) object->work_lock is taken.
> 2) the operation is added to the to_do list.
> 3) the work lock is dropped.
> 4) fscache_enqueue_retrieval is called, which takes a reference.
>
> Another thread is in cachefiles_read_copier:
> 1) object->work_lock is taken
> 2) an item is popped off the to_do list.
> 3) object->work_lock is dropped.
> 4) some processing is done on the item, and fscache_put_retrieval()
> is called, dropping a reference.
>
> Now if the this process in cachefiles_read_copier takes place
> *between* steps 3 and 4 in cachefiles_read_waiter, a reference will be
> dropped before it is taken, which leads to the object's reference count
> hitting zero, which leads to lifecycle events for the object happening
> too soon, leading to the assertion failure later on.
>
> Move fscache_enqueue_retrieval under the lock in
> cachefiles_read_waiter. This means that the object cannot be popped
> off the to_do list until it is in a fully consistent state with the
> reference taken.
>
> Signed-off-by: Lei Xue <carmark.dlut at gmail.com>
> Reviewed-by: Daniel Axtens <dja at axtens.net>
> [dja: rewrite and expand commit message]
> (From https://www.redhat.com/archives/linux-cachefs/2018-February/msg00000.html
> This patch has been sitting on the mailing list for months with no
> response from the maintainer. A similar patch fixing the same issue
> was posted as far back as May 2017, and likewise had no response:
> https://www.redhat.com/archives/linux-cachefs/2017-May/msg00002.html
> I poked the list recently and also got nothing:
> https://www.redhat.com/archives/linux-cachefs/2018-May/msg00000.html
> and the problem was again reported and this patch validated by
> another user:
> https://www.redhat.com/archives/linux-cachefs/2018-May/msg00001.html
> Hence the submission as a sauce patch.)
> Signed-off-by: Daniel Axtens <daniel.axtens at canonical.com>
Acked-by: Stefan Bader <stefan.bader at canonical.com>
> ---
> fs/cachefiles/rdwr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
> index c0f3da3926a0..d9bb47d1e16d 100644
> --- a/fs/cachefiles/rdwr.c
> +++ b/fs/cachefiles/rdwr.c
> @@ -58,9 +58,9 @@ static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode,
>
> spin_lock(&object->work_lock);
> list_add_tail(&monitor->op_link, &monitor->op->to_do);
> + fscache_enqueue_retrieval(monitor->op);
> spin_unlock(&object->work_lock);
>
> - fscache_enqueue_retrieval(monitor->op);
> return 0;
> }
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20180605/99fcc2ad/attachment.sig>
More information about the kernel-team
mailing list