[Bug 1852747] Re: mdcheck_start.service trying to start unexisting file
Eric Desrochers
1852747 at bugs.launchpad.net
Thu Oct 1 16:09:04 UTC 2020
Actually, I have marked the bug as verification-done too fast ... one
item that I wanted to see is still missing.
I'd like feedback on the 'natural' run that should happen on October 4
(sunday).
I'll wait for your feedback.
For now, I'll switch the LP back to verification-needed.
- Eric
** Tags removed: verification-done verification-done-focal
** Tags added: verification-needed verification-needed-focal
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1852747
Title:
mdcheck_start.service trying to start unexisting file
Status in mdadm package in Ubuntu:
Fix Released
Status in mdadm source package in Bionic:
Won't Fix
Status in mdadm source package in Focal:
Fix Committed
Status in mdadm package in Debian:
New
Bug description:
[Impact]
The mdadm package is missing the mdcheck script. This has two
consequences:
In the immediate term, that means that we get failed systemd units on
all of our physical machines (because they have mirrored disks) as we
upgrade them to 20.04. This raises alarms in our monitoring system as
we monitor systemd unit failures.
In the longer-term, this means that the arrays are not being checked.
If a drive develops a bad sector, this would normally be caught by the
checking and a good copy would be rewritten from the other side of the
mirror. Without the checking, that will not happen. If the other drive
(the one with the good version of the sector) dies, then that sector's
data is lost permanently. The consequences of that depend on what that
sector was storing, but it's not good, obviously.
[Test Case]
* systemctl start mdcheck_start.service
* journalctl -u mdcheck_start
-- Logs begin at Wed 2020-09-23 18:33:35 UTC, end at Wed 2020-09-23 18:40:27 UTC. --
Sep 23 18:40:27 mdadmgroovy systemd[1]: Starting MD array scrubbing...
Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed to execute command: No such file or directory
Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed at step EXEC spawning /usr/share/mdadm/mdcheck: No such file or directory
Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Main process exited, code=exited, status=203/EXEC
Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Failed with result 'exit-code'.
Sep 23 18:40:27 mdadmgroovy systemd[1]: Failed to start MD array scrubbing.
* ls -altr /usr/share/mdadm/mdcheck
ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory
* dpkg -l mdadm
ii mdadm 4.1-5ubuntu1 amd64 tool to administer Linux MD arrays (software RAID)
* dpkg -L mdadm | grep -i mdcheck
/lib/systemd/system/mdcheck_continue.service
/lib/systemd/system/mdcheck_continue.timer
/lib/systemd/system/mdcheck_start.service
/lib/systemd/system/mdcheck_start.timer
* Also, we'd like to see if the mdcheck is performed under the
'natural' scheduled execution (so on nearest Sunday) and have impacted
users to report feedback supported with logs.
* We found a regression fixed upstream:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=6636788aaf4ec0cacaefb6e77592e4a68e70a957
* We then found a regression fix for the above regression fix, push into groovy, and then submitted upstream to linux-raid ML:
https://marc.info/?l=linux-raid&m=160130979927617&w=2
* We'd like to see if when mdcheck_start is enabled, enable
mdcheck_continue too.
[Regression Potential]
* 'misc/mdcheck' will be introduced in Ubuntu for the first time, and
is pretty young in the Debian mdadm story too (introduced in Sept 12
2020).
Not known fix since debian introduced it 2 weeks-ish ago has been
added on top of it so far.
$ git log --oneline --grep="mdcheck"
5a3db0f Install misc/mdcheck; turn on hardening; enable dh_lintian. (Closes: #960132)
f258a5e mdcheck: improve cleanup
ea83549 mdcheck: add some logging.
979b1fe mdcheck: be careful when sourcing the output of "mdadm --detail --export"
36dab45 mdcheck: don't git error if not /dev/md?* devices exist.
868ab80 mdcheck: don't pass the '+' to "date".
df881f7 mdcheck: new script to help with regular checks of md arrays.
And no presence of new opened bug(s) related to mdcheck introduction.
At code inspection, 'mdcheck' script seems to be harmless (at least at
first glance), of course, real case scenario testing within raid types
situations will be needed to conclude during the verification testing
phase, and if possible, running the script in debug mode (set -xv)
might be a good idea to see the script workflow in action.
This change will permit 'mdcheck' to be run on the first Sunday of
each month for 6 hours (mdcheck_start.timer: OnCalendar=Sun *-*-1..7
1:00:00), then on every subsequent morning until the check is finished
(mdcheck_continue.timer:OnCalendar=daily).
It's not a script that one would typically run manually on a regular
basis.
The script uses 'logger' to enter messages into the system log, so we
will have a trace of its execution (in addition the systemd unit,timer
usual logs) when it begins, paused and continue. I also added in my
upload a patch in which mdcheck logs the completion as well. Giving
the opportunity to user to know how long the raid check took, which I
think is paramount information to include with the introduction of
this script in Ubuntu.
I would suggest we don't release the package in focal-updates before
having at least one sample of a 'natural' scheduled execution on the
first Sunday of the month (Next should be October 4th ?), and have
impacted users to report feedback supported with logs.
I think running it on Sunday is reasonable, (just like fstrim, zfs
scrub, ...). Typically, Sunday is a day when cron and timer runs to do
some execution like that.
One thing, I would like to confirm, but maybe not a blocker for this
case, is to make sure 'mdcheck_continue' starts fine when condition
are met, since it has never been tester due to 'mdcheck_start' failure
due to missing 'mdcheck' script.
[Other Info]
Debian bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960132
salsa commit:
https://salsa.debian.org/lechner/mdadm/-/commit/5a3db0f5429fc81e0f53cbf9aa473059b74fe057
[Original Description]
mdcheck_start.service trying to start unexisting file
root at d:~# cat /lib/systemd/system/mdcheck_start.service | grep Exec
ExecStart=/usr/share/mdadm/mdcheck --duration $MDADM_CHECK_DURATION
root at d:~# ls -la /usr/share/mdadm/mdcheck
ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory
ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: mdadm 4.1-2ubuntu3
ProcVersionSignature: Ubuntu 5.3.0-19.20-generic 5.3.1
Uname: Linux 5.3.0-19-generic x86_64
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
Date: Fri Nov 15 13:13:17 2019
Lspci: Error: [Errno 2] No such file or directory: 'lspci': 'lspci'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb': 'lsusb'
MachineType: HP HP EliteBook x360 1030 G3
ProcEnviron:
LANG=C
TERM=screen
PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-19-generic root=/dev/mapper/system-root ro cryptdevice=UUID=95c107ea-73d0-4206-a31c-fb0ed6d7d6a9:cryptlvm mem_sleep_default=deep
ProcMDstat:
Personalities :
unused devices: <none>
SourcePackage: mdadm
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/07/2019
dmi.bios.vendor: HP
dmi.bios.version: Q90 Ver. 01.08.01
dmi.board.name: 8438
dmi.board.vendor: HP
dmi.board.version: KBC Version 14.3F.00
dmi.chassis.asset.tag: 5CD9296RDC
dmi.chassis.type: 31
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrQ90Ver.01.08.01:bd08/07/2019:svnHP:pnHPEliteBookx3601030G3:pvr:rvnHP:rn8438:rvrKBCVersion14.3F.00:cvnHP:ct31:cvr:
dmi.product.family: 103C_5336AN HP EliteBook x360
dmi.product.name: HP EliteBook x360 1030 G3
dmi.product.sku: 5SR46ES#ACB
dmi.sys.vendor: HP
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
initrd.files: Error: [Errno 2] No such file or directory: '/boot/initrd.img-5.3.0-19-generic'
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1852747/+subscriptions
More information about the foundations-bugs
mailing list