[Bug 1847924] Re: Introduce broken state parsing to mdadm
Launchpad Bug Tracker
1847924 at bugs.launchpad.net
Wed Dec 4 18:56:15 UTC 2019
This bug was fixed in the package mdadm - 4.1-4ubuntu1
---------------
mdadm (4.1-4ubuntu1) focal; urgency=medium
[ dann frazier ]
* Merge from Debian unstable. Remaining changes:
- Ship finalrd hook.
- Do not install mdadm-shutdown.service on Ubuntu.
- Drop broken and unused init scripts in favor of native systemd units,
which can cause failure to reconfigure mdadm package under certain
confiment types.
- Drop /etc/cron.d/mdadm and migrate to systemd mdcheck_start|continue
timer units.
- Drop /etc/cron.daily/mdadm and migrate to system mdmonitor-oneshot
timer unit.
- mdcheck_start.timer configures the mdcheck on a first sunday of the
month, with a randomized start delay of up to 24h, and runs for at
most 6h. mdcheck_continue.timer kicks off daily, with a randomized
start delay of up to 12h, and continues mdcheck for at most 6h.
- mdmonitor-oneshot.timer runs daily, with a randomized start delay of
up to 24h.
- One can use systemd drop-ins to change .timer units timings, set
environmental variables to decrease/increase the length of checking,
or start the checks by hand. Previously used checkarray is still
available, albeit not used by timer units.
- Above ensures that previous daily / monthly checks are performed, but
are randomized, such that performance is not as impacted across a
cluster of machines.
* Honor the debconf daily autoscan setting in the systemd timer.
[ Guilherme G. Piccoli ]
* Introduce "broken" state for RAID0/Linear in mdadm (LP: #1847924)
-- dann frazier <dannf at ubuntu.com> Wed, 04 Dec 2019 07:05:07 -0700
** Changed in: mdadm (Ubuntu Focal)
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1847924
Title:
Introduce broken state parsing to mdadm
Status in mdadm package in Ubuntu:
Fix Released
Status in mdadm source package in Bionic:
In Progress
Status in mdadm source package in Disco:
In Progress
Status in mdadm source package in Eoan:
In Progress
Status in mdadm source package in Focal:
Fix Released
Status in mdadm package in Debian:
New
Bug description:
[Impact]
* Currently, mounted raid0/md-linear arrays have no indication/warning
when one or more members are removed or suffer from some non-
recoverable error condition. The mdadm tool shows "clean" state
regardless if a member was removed.
* The patch proposed in this SRU addresses this issue by introducing a
new state "broken", which is analog to "clean" but indicates that
array is not in a good/correct state. The commit, available upstream
as 43ebc910 ("mdadm: Introduce new array state 'broken' for
raid0/linear") [0], was extensively discussed and received a good
amount of reviews/analysis by both the current mdadm maintainer as
well as an old maintainer.
* One important note here is that this patch requires a counter-part in the kernel to be fully functional, which was SRUed in LP: #1847773.
It works fine/transparently without this kernel counter-part though.
* We had reports of users testing a scenario of failed raid0 arrays,
and getting 'clean' in mdadm proved to cause confusion and doesn't
help on noticing something went wrong with the arrays.
* The potential situation this patch (with its kernel counter-part)
addresses is: an user has raid0/linear array, and it's mounted. If one
member fails and gets removed (either physically, like a power or
firmware issue, or in software, like a driver-induced removal due to
detected failure), _without_ this patch (and its kernel counter-part)
there's nothing to let user know it failed, except filesystem errors
in dmesg. Also, non-direct writes to the filesystem will succeed, due
to how page-cache/writeback work; even a 'sync' command run will
succeed.
* The case described in above bullet was tested and the writes to
failed devices succeeded - after a reboot, the files written were
present in the array, but corrupted. An user wouldn't noticed that
unless if the writes were directed or some checksum was performed in
the files. With this patch (and its kernel counter-part), the writes
to such failed raid0/linear array are fast-failed and the filesystem
goes read-only quickly.
[Test case]
* To test this patch, create a raid0 or linear md array on Linux using
mdadm, like: "mdadm --create md0 --level=0 --raid-devices=2
/dev/nvme0n1 /dev/nvme1n1";
* Format the array using a FS of your choice (for example ext4) and
mount the array;
* Remove one member of the array, for example using sysfs interface
(for nvme: echo 1 > /sys/block/nvme0n1/device/device/remove, for scsi:
echo 1 > /sys/block/sdX/device/delete);
* Without this patch, the array state shown by "mdadm --detail" is
"clean", regardless a member is missing/failed.
[Regression potential]
* There are mainly two potential regressions here; the first is user-
visible changes introduced by this mdadm patch. The second is if the
patch itself has some unnoticed bug.
* For the first type of potential regression: this patch introduces a
change in how the array state is displayed in "mdadm --detail <array>"
output for raid0/linear arrays *only*. Currently, the tool shows just
2 states, "clean" or "active". In the patch being SRUed here, this
changes for raid0/linear arrays to read the sysfs array state instead.
So for example, we could read "readonly" state here for raid0/linear
if the user (or some tool) changes the array to such state. This only
affects raid0/linear, the output for other levels didn't change at
all.
* Regarding potential unnoticed issues in the code, we changed mainly
structs and the "detail" command. Structs were incremented with the
new "broken" state and the detail output was changed for raid0/linear
as discussed in the previous bullet.
* Note that we *proactively* skipped Xenial SRU here, in order to
prevent potential regressions - Xenial mdadm tool lacks code
infrastructure used by this patch, so the decision was for
safety/stability, by only SRUing Bionic / Disco / Eoan mdadm versions.
[0]
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=43ebc910
[other info]
As mdadm for focal (20.04) hasn't been merged yet, this will need to
be added there during or after merge.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1847924/+subscriptions
More information about the foundations-bugs
mailing list