SRU request for LP#208551
tim.gardner at canonical.com
Thu Sep 11 02:57:25 UTC 2008
Colin Ian King wrote:
> Sï»¿RU justification:
> Impact: mdadm, Raid5 get stuck in uninterruptable sleep under heavy I/O
> load. Copying data to a Raid 5 XFS partition results in a permanent lock
> on several processes related to it, getting stuck in the D(+) state.
> Occurs when large quantities of data (10-40 GB) is copied, resulting in
> processes being unkillable, and the system cannot reboot and requires
> power cycling the server.
> Fix: The patch from commit 6ed3003c19a96fe18edf8179c4be6fe14abbebbc. The
> fix is to not make any generic_make_request() calls in raid5
> make_request until all waiting has been done. We do this by simply
> setting STRIPE_HANDLE instead of calling handle_stripe(). This causes a
> performance hit, so this patch also only calls raid5_activate_delayed()
> at unplug time, never in raid5. This seems to bring back the
> performance numbers. [quoting the commit message]
> Testing: Without the patch, Raid 5 using md on an XFS filesystem locks
> up under heavy data copying - this is repeatable. With the patch, the
> lock up does not occur.
> Patch tested in my PPA by Andrew Cholakian
> on 2 64 bit servers.
> Patch attached.
ACK. How far back does this bug go? Might this patch be appropriate for
Tim Gardner tim.gardner at canonical.com
More information about the kernel-team