ACK/Cmnt: [SRU][B][F][G][PATCH 0/7] raid10: Block discard is very slow, causing severe delays for mkfs and fstrim operations

Stefan Bader stefan.bader at canonical.com
Mon Nov 2 08:29:04 UTC 2020


On 02.11.20 01:13, Matthew Ruffell wrote:
> On 29/10/20 9:46 pm, Stefan Bader wrote:
>>> There is also some additional commits which is required, and was merged after 
>>> "md/raid10: improve raid10 discard request" was merged. The following commits 
>>> enables Radid10 to use large discards, instead of splitting into many bios, 
>>> since the technical hurdles have now been removed.
>>
>> The below two patches are marked up as only needed for F and G. What about
>> Bionic? If the changes they refer to were in 4.12, then those would have to go
>> to Bionic as well.
> 
> Sorry, it seems I have confused you in my SRU template. See backport note #3
> about the two commits for F and G:
> 
>> 3) The 4.15 kernel does not need "dm raid: fix discard limits for raid1 and raid10"
>> and "dm raid: remove unnecessary discard limits for raid10" due to not having
>> the following commit, which was merged in 5.1-rc1:
>>
>> commit 61697a6abd24acba941359c6268a94f4afe4a53d
>> Author: Mike Snitzer <snitzer at redhat.com>
>> Date:   Fri Jan 18 14:19:26 2019 -0500
>> Subject: dm: eliminate 'split_discard_bios' flag from DM target interface
>> Link: https://github.com/torvalds/linux/commit/61697a6abd24acba941359c6268a94f4afe4a53d
> 
> Now, "dm: eliminate 'split_discard_bios' flag from DM target interface" is a
> really messy backport to Bionic, it also changes the DM API and version
> requirements, and it also needs a bunch of dependency commits to enable
> blk_queue_split() to handle splitting discards based on queue_limits. In 4.15,
> this is all handled by DM core.
> 
> The two commits marked for F and G but not B, do not actually modify performance
> in any way, and technically aren't needed to solve the problem. I included them
> for F and G to provide a more complete fix, but they can be omitted if necessary.

I guess part of the confusion was that, while your statement mentioned them not
needed, they sound like being performance related and patch #7 in the commit
message refers to patch #4 which is provided for Bionic as well.

> 
> I decided that making the two commits marked for F and G work in Bionic is too
> much of a regression risk, and did not include them in the SRU. The benefits for
> the 4.15 kernel do not outweigh the risks, while for the 5.4 and 5.8 kernels,
> they will get one of the commits via upstream -stable anyway, so we may as well
> ship a complete solution.
> 
> As for the paragraph that mentions the changes in 4.12, it is referencing the
> overall architecture changes required for raid0 to gain better performance, as
> it was used as a design template for the changes to raid10. I included it more
> or less as a statement that the new code follows a design implemented in 4.12,
> and that the design still holds up today, in order to try lend the new patches
> some credibility, since they are more or less the same refactor as introduced
> in:
> 
>> All the commits mentioned follow a similar strategy which was implemented in 
>> Raid0 in the below commit, which was merged in 4.12-rc2, which fixed block 
>> discard performance issues in Raid0:
>>
>> commit 29efc390b9462582ae95eb9a0b8cd17ab956afc0
>> Author: Shaohua Li <shli at fb.com>
>> Date: Sun May 7 17:36:24 2017 -0700
>> Subject: md/md0: optimize raid0 discard handling
>> Link: https://github.com/torvalds/linux/commit/29efc390b9462582ae95eb9a0b8cd17ab956afc0 
> 
> "follow a similar strategy which was implemented in Raid0" was a poor choice of
> words by me to say that the whole patchset is derived from the refactor of raid0
> to fix similar problems back in 4.12.
> 
> Sorry for the confusion.

This is a rather complex case. Confusion to some degree is probably inevitable.

> 
>> Beside that, I am not sure how exactly that might be better phrased, but
>> personally I stumbled over "remove 'address of' pointer for...". Mabye "do
>> not use a pointer for one of the arguments to ..." but not sure.
> 
> Possibly we could change the wording to "remove pointer reference for parameter
> in ...".
> 
> e.g. for "md/raid10: improve raid10 discard request":
> 
> (backported from commit bcc90d280465ebd51ab8688be86e1f00c62dccf9)
> [mruffell: change submit_bio_noacct() to generic_make_request(), remove pointer
> reference for parameter in bio_split(), mempool_alloc(), bio_clone_fast()]
> Signed-off-by: Matthew Ruffell <matthew.ruffell at canonical.com>
> 
> e.g. for "md/raid10: improve discard request for far layout":
> 
> (backported from commit d3ee2d8415a6256c1c41e1be36e80e640c3e6359)
> [mruffell: remove pointer reference for parameter in mempool_alloc()]
> Signed-off-by: Matthew Ruffell <matthew.ruffell at canonical.com>
> 
> Would you like me to send a V2? Or is it possible for you to change in place?

This is not something that must change. That I got confused does not mean it is
wrong. So using the "remove pointer reference" form in future will make it
simpler for me to know what I am looking for. If this gets applied we might or
might not change it on the fly, but I do not consider this a blocker.

> 
> Let me know if you have any more concerns. I know this is a big SRU request, so
> I have tried to document it the best I can. A customer would also really
> like this SRU to go through, since they are avoiding all use of raid10 due to
> the severe delays with block discard, but they want to use raid10 on NVMe
> devices, and are frustrated that they can't. At the same time, I also understand
> if you have regression concerns due to the amount of code changing.
>

On the good side, there is only one change to md code that is more generic
usage. And that looked to be mostly shifting existing code into a separate
function. So ok. And the other changes are raid10 discipline, so with targeted
testing / verification I think we can be fairly safe.

So with this:
Acked-by: Stefan Bader <stefan.bader at canonical.com>

> Thanks,
> Matthew
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20201102/914737f4/attachment.sig>


More information about the kernel-team mailing list