ACK: [SRU][N][O][PATCH 0/3] Introduce and use sendpages_ok() instead of sendpage_ok() in nvme-tcp and drbg
Koichiro Den
koichiro.den at canonical.com
Wed Feb 19 06:34:15 UTC 2025
On Thu, Jan 30, 2025 at 02:26:40PM GMT, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2093871
>
> [Impact]
>
> Currently the nvme-tcp and drbg subsystems try to enable the MSG_SPLICE_PAGES
> flag on pages to be written, and when MSG_SPLICE_PAGES is set, eventually it
> calls skb_splice_from_iter(), which then checks all pages with sendpage_ok()
> to see if all the pages are sendable.
>
> At the moment, both subsystems only check the first page in a potentially
> contiguous block of pages, if they are sendpage_ok(), and if the first page is,
> then it just assumes all the rest are sendpage_ok() too, and sends the I/O off
> to eventually be found out by skb_splice_from_iter(). If one or more of the
> pages in the contiguous block is not sendpage_ok(), then we get a warn printed,
> data transfer is aborted. In the nvme-tcp case, IO then hangs.
>
> This patchset introduces sendpages_ok() which iterates over each page in a
> contiguous block, checks if it is sendpage_ok(), and only returns true if all
> of them are.
>
> This resolves the whole MSG_SPLICE_PAGES flag situation, since you can now
> depend on the result of sendpages_ok(), instead of just assuming everything is
> okay.
>
> This issue is what caused bug 2075110 [0] to be discovered in the first place,
> since it was responsible for contigious blocks of pages where the first was
> sendpage_ok(), but pages further into the block were not.
>
> [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110
>
> Even with "md/md-bitmap: fix writing non bitmap pages" applied, the issue can
> still happen, e.g. with merged IO pages, so this fix is still needed to
> eliminate the issue.
>
> [Fix]
>
> The fixes landed in mainline 6.12-rc1:
>
> commit 23a55f4492fcf868d068da31a2cd30c15f46207d
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date: Thu Jul 18 11:45:12 2024 +0300
> Subject: net: introduce helper sendpages_ok()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=23a55f4492fcf868d068da31a2cd30c15f46207d
>
> commit 6af7331a70b4888df43ec1d7e1803ae2c43b6981
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date: Thu Jul 18 11:45:13 2024 +0300
> Subject: nvme-tcp: use sendpages_ok() instead of sendpage_ok()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af7331a70b4888df43ec1d7e1803ae2c43b6981
>
> commit 7960af373ade3b39e10106ef415e43a1d2aa48c6
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date: Thu Jul 18 11:45:14 2024 +0300
> Subject: drbd: use sendpages_ok() instead of sendpage_ok()
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7960af373ade3b39e10106ef415e43a1d2aa48c6
>
> They are needed for noble and oracular.
>
> [Testcase]
>
> This is the same testcase as the original bug 2075110 [0], as the fix is
> designed to prevent it or similar other bugs from happening again.
>
> [0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2075110
>
> Because of this, the fix:
>
> commit ab99a87542f194f28e2364a42afbf9fb48b1c724
> Author: Ofir Gal <ofir.gal at volumez.com>
> Date: Fri Jun 7 10:27:44 2024 +0300
> Subject: md/md-bitmap: fix writing non bitmap pages
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ab99a87542f194f28e2364a42afbf9fb48b1c724
>
> needs to be reverted during your test runs, or you won't see the issue
> reproduce.
>
> You can use this ppa for updated kernels with the revert to trigger the issue:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-revert
>
> This can be reproduced by running blktests md/001 [1], which the author of the fix created to act as a regression test for this issue.
>
> [1] https://github.com/osandov/blktests/commit/a24a7b462816fbad7dc6c175e53fcc764ad0a822
>
> Deploy a fresh Noble VM, that has a scratch NVME disk.
>
> $ sudo apt install build-essential fio
> $ git clone https://github.com/osandov/blktests.git
> $ cd blktests
> $ make
> $ echo "TEST_DEVS=(/dev/nvme0n1)" > config
> $ sudo ./check md/001
>
> The md/001 test will hang an affected system, and the above oops message will be visible in dmesg.
>
> A test kernel is available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-test
>
> This has both the fixes for this bug, and also bug 2075110. The issue will not
> reproduce.
>
> There is also a test kernel available with the fix for this bug present, and the
> fix for bug 2075110 reverted, so you can see the impact of these patches only:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf404844-repro
>
> This will also not reproduce the issue anymore.
>
> [Where problems could occur]
>
> What we are changing is rather simple. Instead of checking the first page and
> assuming all the rest in the contiguous block are sendpage_ok(), we now
> check each page in the contiguous block to see if all of them are sendpage_ok().
>
> If any aren't, then we abort the write to the driver, and try again later. This
> saves us time.
>
> However, it does take longer to call sendpage_ok() on each of the pages in the
> contiguous block, so there will be a minor performance hit.
>
> Small performance hit for correctness should be okay.
>
> Currently we are only applying to nvme-tcp and drbg subsystems. If a regression
> were to occur, it would affect users of those subsystems only.
>
> [Other info]
>
> Upstream mailing list:
> https://lore.kernel.org/all/20240718084515.3833733-1-ofir.gal@volumez.com/T/#u
>
> Ofir Gal (3):
> net: introduce helper sendpages_ok()
> nvme-tcp: use sendpages_ok() instead of sendpage_ok()
> drbd: use sendpages_ok() instead of sendpage_ok()
>
> drivers/block/drbd/drbd_main.c | 2 +-
> drivers/nvme/host/tcp.c | 2 +-
> include/linux/net.h | 19 +++++++++++++++++++
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
Acked-by: Koichiro Den <koichiro.den at canonical.com>
More information about the kernel-team
mailing list