[3.13.y-ckt stable] Patch "rbd: drop an unsafe assertion" has been added to staging queue

Kamal Mostafa kamal at canonical.com
Tue Apr 7 19:27:49 UTC 2015


This is a note to let you know that I have just added a patch titled

    rbd: drop an unsafe assertion

to the linux-3.13.y-queue branch of the 3.13.y-ckt extended stable tree 
which can be found at:

 http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.13.y-queue

This patch is scheduled to be released in version 3.13.11-ckt19.

If you, or anyone else, feels it should not be added to this tree, please 
reply to this email.

For more information about the 3.13.y-ckt tree, see
https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable

Thanks.
-Kamal

------

>From 916f52288d75082945f07fa98b47e1758bea800a Mon Sep 17 00:00:00 2001
From: Alex Elder <elder at linaro.org>
Date: Tue, 25 Mar 2014 15:36:02 +0200
Subject: rbd: drop an unsafe assertion

commit 638c323c4d1f8eaf25224946e21ce8818f1bcee1 upstream.

Olivier Bonvalet reported having repeated crashes due to a failed
assertion he was hitting in rbd_img_obj_callback():

    Assertion failure in rbd_img_obj_callback() at line 2165:
	rbd_assert(which >= img_request->next_completion);

With a lot of help from Olivier with reproducing the problem
we were able to determine the object and image requests had
already been completed (and often freed) at the point the
assertion failed.

There was a great deal of discussion on the ceph-devel mailing list
about this.  The problem only arose when there were two (or more)
object requests in an image request, and the problem was always
seen when the second request was being completed.

The problem is due to a race in the window between setting the
"done" flag on an object request and checking the image request's
next completion value.  When the first object request completes, it
checks to see if its successor request is marked "done", and if
so, that request is also completed.  In the process, the image
request's next_completion value is updated to reflect that both
the first and second requests are completed.  By the time the
second request is able to check the next_completion value, it
has been set to a value *greater* than its own "which" value,
which caused an assertion to fail.

Fix this problem by skipping over any completion processing
unless the completing object request is the next one expected.
Test only for inequality (not >=), and eliminate the bad
assertion.

Tested-by: Olivier Bonvalet <ob at daevel.fr>
Signed-off-by: Alex Elder <elder at linaro.org>
Reviewed-by: Sage Weil <sage at inktank.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov at inktank.com>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
 drivers/block/rbd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 912a068..43d562d 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2137,7 +2137,6 @@ static void rbd_img_obj_callback(struct rbd_obj_request *obj_request)
 	rbd_assert(img_request->obj_request_count > 0);
 	rbd_assert(which != BAD_WHICH);
 	rbd_assert(which < img_request->obj_request_count);
-	rbd_assert(which >= img_request->next_completion);

 	spin_lock_irq(&img_request->completion_lock);
 	if (which != img_request->next_completion)
--
1.9.1





More information about the kernel-team mailing list