NACK: [SRU][F][PATCH 1/1] ocfs2: fix DIO failure due to insufficient transaction credits
Massimiliano Pellizzer
massimiliano.pellizzer at canonical.com
Wed Oct 16 20:51:20 UTC 2024
On Wed, 16 Oct 2024 at 20:26, Manuel Diewald
<manuel.diewald at canonical.com> wrote:
>
> On Tue, Oct 15, 2024 at 01:40:49PM +0200, Massimiliano Pellizzer wrote:
> > From: Jan Kara <jack at suse.cz>
> >
> > commit be346c1a6eeb49d8fda827d2a9522124c2f72f36 upstream.
> >
> > The code in ocfs2_dio_end_io_write() estimates number of necessary
> > transaction credits using ocfs2_calc_extend_credits(). This however does
> > not take into account that the IO could be arbitrarily large and can
> > contain arbitrary number of extents.
> >
> > Extent tree manipulations do often extend the current transaction but not
> > in all of the cases. For example if we have only single block extents in
> > the tree, ocfs2_mark_extent_written() will end up calling
> > ocfs2_replace_extent_rec() all the time and we will never extend the
> > current transaction and eventually exhaust all the transaction credits if
> > the IO contains many single block extents. Once that happens a
> > WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in
> > jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to
> > this error. This was actually triggered by one of our customers on a
> > heavily fragmented OCFS2 filesystem.
> >
> > To fix the issue make sure the transaction always has enough credits for
> > one extent insert before each call of ocfs2_mark_extent_written().
> >
> > Heming Zhao said:
> >
> > ------
> > PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error"
> >
> > PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA"
> > #0 machine_kexec at ffffffff8c069932
> > #1 __crash_kexec at ffffffff8c1338fa
> > #2 panic at ffffffff8c1d69b9
> > #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2]
> > #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2]
> > #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2]
> > #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2]
> > #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2]
> > #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2]
> > #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2]
>
> Not sure how it happened but the commit message seems to be altered and
> is missing the rest of the stack trace:
>
> #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2]
> #11 dio_complete at ffffffff8c2b9fa7
> #12 do_blockdev_direct_IO at ffffffff8c2bc09f
> #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2]
> #14 generic_file_direct_write at ffffffff8c1dcf14
> #15 __generic_file_write_iter at ffffffff8c1dd07b
> #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2]
> #17 aio_write at ffffffff8c2cc72e
> #18 kmem_cache_alloc at ffffffff8c248dde
> #19 do_io_submit at ffffffff8c2ccada
> #20 do_syscall_64 at ffffffff8c004984
> #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba
>
> Can you resubmit with the original commit message?
>
Not sure how it happened either. I will send a v2 soon. Thanks.
> >
> > Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz
> > Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz
> > Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io")
> > Signed-off-by: Jan Kara <jack at suse.cz>
> > Reviewed-by: Joseph Qi <joseph.qi at linux.alibaba.com>
> > Reviewed-by: Heming Zhao <heming.zhao at suse.com>
> > Cc: Mark Fasheh <mark at fasheh.com>
> > Cc: Joel Becker <jlbec at evilplan.org>
> > Cc: Junxiao Bi <junxiao.bi at oracle.com>
> > Cc: Changwei Ge <gechangwei at live.cn>
> > Cc: Gang He <ghe at suse.com>
> > Cc: Jun Piao <piaojun at huawei.com>
> > Cc: <stable at vger.kernel.org>
> > Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
> > Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
> > (backported from commit a68b896aa56e435506453ec8835bc991ec3ae687 linux-5.10.y)
> > [mpellizzer: backported using handle->h_buffer_credits instead of
> > jbd2_handle_buffer_credits(handle) since the latter is not defined
> > in focal]
> > CVE-2024-42077
> > Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer at canonical.com>
>
> Other than the commit message issue the patch looks good.
>
> --
> Manuel
--
Massimiliano Pellizzer
More information about the kernel-team
mailing list