ACK: [SRU][N][O][PATCH 0/3] Introduce and use sendpages_ok() instead of sendpage_ok() in nvme-tcp and drbg

Koichiro Den koichiro.den at canonical.com
Wed Feb 19 06:34:15 UTC 2025


On Thu, Jan 30, 2025 at 02:26:40PM GMT, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2093871
> 
> [Impact]
> 
> Currently the nvme-tcp and drbg subsystems try to enable the MSG_SPLICE_PAGES
> flag on pages to be written, and when MSG_SPLICE_PAGES is set, eventually it
> calls skb_splice_from_iter(), which then checks all pages with sendpage_ok()
> to see if all the pages are sendable.
> 
> At the moment, both subsystems only check the first page in a potentially
> contiguous block of pages, if they are sendpage_ok(), and if the first page is,
> then it just assumes all the rest are sendpage_ok() too, and sends the I/O off
> to eventually be found out by skb_splice_from_iter(). If one or more of the
> pages in the contiguous block is not sendpage_ok(), then we get a warn printed,
> data transfer is aborted. In the nvme-tcp case, IO then hangs.
> 
> This patchset introduces sendpages_ok() which iterates over each page in a
> contiguous block, checks if it is sendpage_ok(), and only returns true if all
> of them are.
> 
> This resolves the whole MSG_SPLICE_PAGES flag situation, since you can now
> depend on the result of sendpages_ok(), instead of just assuming everything is
> okay.
> 
> This issue is what caused bug 2075110 [0] to be discovered in the first place,
> since it was responsible for contigious blocks of pages where the first was
> sendpage_ok(), but pages further into the block were not.
> 
> [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110
> 
> Even with "md/md-bitmap: fix writing non bitmap pages" applied, the issue can
> still happen, e.g. with merged IO pages, so this fix is still needed to
> eliminate the issue.
> 
> [Fix]
> 
> The fixes landed in mainline 6.12-rc1:
> 
> commit 23a55f4492fcf868d068da31a2cd30c15f46207d
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date:   Thu Jul 18 11:45:12 2024 +0300
> Subject: net: introduce helper sendpages_ok()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23a55f4492fcf868d068da31a2cd30c15f46207d
>     
> commit 6af7331a70b4888df43ec1d7e1803ae2c43b6981
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date:   Thu Jul 18 11:45:13 2024 +0300
> Subject: nvme-tcp: use sendpages_ok() instead of sendpage_ok()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af7331a70b4888df43ec1d7e1803ae2c43b6981
>     
> commit 7960af373ade3b39e10106ef415e43a1d2aa48c6
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date:  Thu Jul 18 11:45:14 2024 +0300
> Subject: drbd: use sendpages_ok() instead of sendpage_ok()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7960af373ade3b39e10106ef415e43a1d2aa48c6
> 
> They are needed for noble and oracular.
> 
> [Testcase]
> 
> This is the same testcase as the original bug 2075110 [0], as the fix is
> designed to prevent it or similar other bugs from happening again.
> 
> [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110
> 
> Because of this, the fix:
> 
> commit ab99a87542f194f28e2364a42afbf9fb48b1c724
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date: Fri Jun 7 10:27:44 2024 +0300
> Subject: md/md-bitmap: fix writing non bitmap pages
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab99a87542f194f28e2364a42afbf9fb48b1c724
> 
> needs to be reverted during your test runs, or you won't see the issue
> reproduce.
> 
> You can use this ppa for updated kernels with the revert to trigger the issue:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-revert
> 
> This can be reproduced by running blktests md/001 [1], which the author of the fix created to act as a regression test for this issue.
> 
> [1] https://github.com/osandov/blktests/commit/a24a7b462816fbad7dc6c175e53fcc764ad0a822
> 
> Deploy a fresh Noble VM, that has a scratch NVME disk.
> 
> $ sudo apt install build-essential fio
> $ git clone https://github.com/osandov/blktests.git
> $ cd blktests
> $ make
> $ echo "TEST_DEVS=(/dev/nvme0n1)" > config
> $ sudo ./check md/001
> 
> The md/001 test will hang an affected system, and the above oops message will be visible in dmesg.
> 
> A test kernel is available in the following ppa:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-test
> 
> This has both the fixes for this bug, and also bug 2075110. The issue will not
> reproduce.
> 
> There is also a test kernel available with the fix for this bug present, and the
> fix for bug 2075110 reverted, so you can see the impact of these patches only:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-repro
> 
> This will also not reproduce the issue anymore.
> 
> [Where problems could occur]
> 
> What we are changing is rather simple. Instead of checking the first page and
> assuming all the rest in the contiguous block are sendpage_ok(), we now
> check each page in the contiguous block to see if all of them are sendpage_ok().
> 
> If any aren't, then we abort the write to the driver, and try again later. This
> saves us time.
> 
> However, it does take longer to call sendpage_ok() on each of the pages in the
> contiguous block, so there will be a minor performance hit.
> 
> Small performance hit for correctness should be okay.
> 
> Currently we are only applying to nvme-tcp and drbg subsystems. If a regression
> were to occur, it would affect users of those subsystems only.
> 
> [Other info]
> 
> Upstream mailing list:
> https://lore.kernel.org/all/20240718084515.3833733-1-ofir.gal@volumez.com/T/#u
> 
> Ofir Gal (3):
>   net: introduce helper sendpages_ok()
>   nvme-tcp: use sendpages_ok() instead of sendpage_ok()
>   drbd: use sendpages_ok() instead of sendpage_ok()
> 
>  drivers/block/drbd/drbd_main.c |  2 +-
>  drivers/nvme/host/tcp.c        |  2 +-
>  include/linux/net.h            | 19 +++++++++++++++++++
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 

Acked-by: Koichiro Den <koichiro.den at canonical.com>



More information about the kernel-team mailing list