[Bug 1847924] Re: Introduce broken state parsing to mdadm
Guilherme G. Piccoli
1847924 at bugs.launchpad.net
Wed Oct 30 17:51:03 UTC 2019
Hi Robie, thanks for you concerns! They are valid and make sense. Let me
first give you some context on why this is needed in stable releases:
the mdadm patch is a counter-part of a kernel modification which
performs 2 things: (a) 'broken' state when raid0/linear arrays have a
member missing; (b) refuse write I/O to such broken arrays.
Without this kernel change, we can continue writing to a broken md device until FS notices some error, which may take a while. And due to writeback/page-cache mechanism, this would be transparent to the regular user, only being noticed by users looking logs and stats of disk writing. The kernel page-cache leads to a situation in which commands like 'dd' or even 'sync' succeeds to a broken device, leading to a partial-written/corrupted file.
So, given we are in the process of SRU the kernel change, the mdadm counter-part is highly desirable due to the (a) part above. Or else, we get an incomplete functionality.
Regarding your points:
1) Agreed! The change was well-tested though, and validated by 3 maintainers (2 from kernel md subsystem, one from mdadm itself). Also, despite the patch is a bit large, it basically adds this 'broken' state to places where it's needed, like arrays and defines (hence its size is not so small) and restrict the change clearly to md0/linear, not affecting at all any other levels.
2) You're partially right here, mdadm keeps working *exactly* the same
for everything _except_ for raid0/linear status, in '--detail' option.
For those 2 levels (raid0/linear), now we read the sysfs state of the
array from sysfs. So, it's a behavior change, but a correct (or at
least, well-accepted) one. And it's concise/scope-reduced, as it doesn't
change any other level or any other functionality in raid0/linear,
except state query.
Feel free to comment in case you have more questions, or ping me in IRC
also (gpiccoli in freenode).
Now, I have one question for you: do you prefer we do a merge in Focal before accepting this SRU? In order to sync Focal with upstream, I mean.
Thanks,
Guilherme
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1847924
Title:
Introduce broken state parsing to mdadm
Status in mdadm package in Ubuntu:
In Progress
Status in mdadm source package in Bionic:
In Progress
Status in mdadm source package in Disco:
In Progress
Status in mdadm source package in Eoan:
In Progress
Status in mdadm source package in Focal:
In Progress
Status in mdadm package in Debian:
New
Bug description:
[Impact]
* Currently, mounted raid0/md-linear arrays have no indication/warning
when one or more members are removed or suffer from some non-
recoverable error condition. The mdadm tool shows "clean" state
regardless if a member was removed.
* The patch proposed in this SRU addresses this issue by introducing a
new state "broken", which is analog to "clean" but indicates that
array is not in a good/correct state. The commit, available upstream
as 43ebc910 ("mdadm: Introduce new array state 'broken' for
raid0/linear") [0], was extensively discussed and received a good
amount of reviews/analysis by both the current mdadm maintainer as
well as an old maintainer.
* One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773.
It works fine/transparently without this kernel counter-part though.
[Test case]
* To test this patch, create a raid0 or linear md array on Linux using
mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2
/dev/nvme0n1 /dev/nvme1n1";
* Format the array using a FS of your choice (for example ext4) and
mount the array;
* Remove one member of the array, for example using sysfs interface
(for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi:
echo 1 > /sys/block/sdX/device/delete);
* Without this patch, the array state shown by "mdadm --detail" is
"clean", regardless a member is missing/failed.
[Regression potential]
* There's not much potential regression here; we just exhibit arrays'
state as "broken" if they have one or more missing/failed members; we
believe the most common "issue" that could be reported from this patch
is if an userspace tool rely on the array status as being always
"clean" even for broken devices, then such tool may behave differently
with this patch.
* Note that we *proactively* skipped Xenial SRU here, in order to
prevent potential regressions - Xenial mdadm tool lacks code
infrastructure used by this patch, so the decision was for
safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions.
[0]
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910
[other info]
As mdadm for focal hasn't been merged yet, this will need to be added
there during or after merge.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions
More information about the foundations-bugs
mailing list