[B/D/E] [PATCH 0/1] md raid0/linear doesn't show error state if an array member is removed and allows successful writes

Kleber Souza kleber.souza at canonical.com
Thu Oct 17 15:17:48 UTC 2019


On 14.10.19 01:47, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1847773
> 
> [Impact]
> 
> * Currently, mounted raid0/md-linear arrays have no indication/warning when one
> or more members are removed or suffer from some non-recoverable error condition.
> 
> * Given that, arrays keep mounted, and regular written data to it goes through
> page cache and appear as successful written to the devices, despite writeback
> threads can't write to it. For users, it can potentially cause data corruption,
> given that even "sync" command will return success despite the data is not
> written to the disk. Kernel messages will show I/O errors though.
> 
> * The patch proposed in this SRU addresses this issue in 2 levels; first, it
> fast-fails written I/Os to the raid0/md-linear array devices with one or more
> failed members. Also, it introduces the "broken" state, which is analog to
> "clean" but indicates that array is not in a good/correct state. A message
> showed indmesg helps to clarify when such array gets a member removed/failed.
> 
> * The commit proposed here, available on Linus tree as 62f7b1989c02
> ("md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone")
> [http://git.kernel.org/linus/62f7b1989c02], was pretty discussed upstream and
> received a good amount of reviews/analysis by both the current md maintainer
> as well as an old maintainer.
> 
> * One important note here is that this patch requires a counter-part in mdadm
> tool to be fully functional, which was SRUed in LP: #1847924.
> It works fine without this counter-part, but in case of broken arrays, the
> "mdadm --detail" command won't show broken, and instead show as "clean, FAILED".
> 
> * We ask hereby an exception from kernel team to have this backported to kernel
> 4.15 *only in Bionic* and not in Xenial. The reason is that mdadm code changed
> too much and we didn't want to introduce a potential regression in the Xenial
> version from that tool, so we only backported the mdadm counter-part of this
> patch to Bionic, Disco and Eoan - hence, we'd like to have a match in the kernel
> backported versions.

Hi Guilherme,

In the paragraph above you mentioned that mdadm works fine without the the
counter-part patch. Is that the case for Xenial as well?

We would strongly prefer to carry this patch to xenial/linux-hwe as well.


Thanks,
Kleber

> 
> [Test case]
> 
> * To test this patch, create a raid0 or linear md array on Linux using mdadm,
> like in:
> "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1";
> 
> * Format the array using a filesystem of your choice (for example ext4) and
> mount the array;
> 
> * Remove one member of the array, for example using sysfs interface (for nvme:
> echo 1 > /sys/block/nvme0nX/device/device/remove, for scsi:
> echo 1 > /sys/block/sdX/device/delete);
> 
> * Without this patch, the array partition can be written with success, and
> "mdadm --detail" will show clean state.
> 
> [Regression potential]
> 
> * There's not much potential regression here; we failed written I/Os to bad
> arrays and show message/status according to it, showing the array broken status.
> We believe the most common "issue" that could be reported from this patch is if
> an userspace tool rely on success of I/O writes or in the "clean" state of an
> array - after this patch it can potentially have a different behavior in case
> of a broken array.
> 
> Guilherme G. Piccoli (1):
>   md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone
> 
>  drivers/md/md-linear.c |  5 +++++
>  drivers/md/md.c        | 22 ++++++++++++++++++----
>  drivers/md/md.h        | 16 ++++++++++++++++
>  drivers/md/raid0.c     |  6 ++++++
>  4 files changed, 45 insertions(+), 4 deletions(-)
> 




More information about the kernel-team mailing list