[B/D/E] [PATCH 0/1] md raid0/linear doesn't show error state if an array member is removed and allows successful writes
Kleber Souza
kleber.souza at canonical.com
Thu Oct 17 15:17:48 UTC 2019
On 14.10.19 01:47, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1847773
>
> [Impact]
>
> * Currently, mounted raid0/md-linear arrays have no indication/warning when one
> or more members are removed or suffer from some non-recoverable error condition.
>
> * Given that, arrays keep mounted, and regular written data to it goes through
> page cache and appear as successful written to the devices, despite writeback
> threads can't write to it. For users, it can potentially cause data corruption,
> given that even "sync" command will return success despite the data is not
> written to the disk. Kernel messages will show I/O errors though.
>
> * The patch proposed in this SRU addresses this issue in 2 levels; first, it
> fast-fails written I/Os to the raid0/md-linear array devices with one or more
> failed members. Also, it introduces the "broken" state, which is analog to
> "clean" but indicates that array is not in a good/correct state. A message
> showed indmesg helps to clarify when such array gets a member removed/failed.
>
> * The commit proposed here, available on Linus tree as 62f7b1989c02
> ("md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone")
> [http://git.kernel.org/linus/62f7b1989c02], was pretty discussed upstream and
> received a good amount of reviews/analysis by both the current md maintainer
> as well as an old maintainer.
>
> * One important note here is that this patch requires a counter-part in mdadm
> tool to be fully functional, which was SRUed in LP: #1847924.
> It works fine without this counter-part, but in case of broken arrays, the
> "mdadm --detail" command won't show broken, and instead show as "clean, FAILED".
>
> * We ask hereby an exception from kernel team to have this backported to kernel
> 4.15 *only in Bionic* and not in Xenial. The reason is that mdadm code changed
> too much and we didn't want to introduce a potential regression in the Xenial
> version from that tool, so we only backported the mdadm counter-part of this
> patch to Bionic, Disco and Eoan - hence, we'd like to have a match in the kernel
> backported versions.
Hi Guilherme,
In the paragraph above you mentioned that mdadm works fine without the the
counter-part patch. Is that the case for Xenial as well?
We would strongly prefer to carry this patch to xenial/linux-hwe as well.
Thanks,
Kleber
>
> [Test case]
>
> * To test this patch, create a raid0 or linear md array on Linux using mdadm,
> like in:
> "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1";
>
> * Format the array using a filesystem of your choice (for example ext4) and
> mount the array;
>
> * Remove one member of the array, for example using sysfs interface (for nvme:
> echo 1 > /sys/block/nvme0nX/device/device/remove, for scsi:
> echo 1 > /sys/block/sdX/device/delete);
>
> * Without this patch, the array partition can be written with success, and
> "mdadm --detail" will show clean state.
>
> [Regression potential]
>
> * There's not much potential regression here; we failed written I/Os to bad
> arrays and show message/status according to it, showing the array broken status.
> We believe the most common "issue" that could be reported from this patch is if
> an userspace tool rely on success of I/O writes or in the "clean" state of an
> array - after this patch it can potentially have a different behavior in case
> of a broken array.
>
> Guilherme G. Piccoli (1):
> md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone
>
> drivers/md/md-linear.c | 5 +++++
> drivers/md/md.c | 22 ++++++++++++++++++----
> drivers/md/md.h | 16 ++++++++++++++++
> drivers/md/raid0.c | 6 ++++++
> 4 files changed, 45 insertions(+), 4 deletions(-)
>
More information about the kernel-team
mailing list