[Bug 688541] Re: race condition on shutdown (leads to corrupted fs)
Launchpad Bug Tracker
688541 at bugs.launchpad.net
Wed Feb 15 08:10:15 UTC 2012
This bug was fixed in the package mysql-5.5 - 5.5.20-0ubuntu1
---------------
mysql-5.5 (5.5.20-0ubuntu1) precise; urgency=low
* New upstream release.
* d/mysql-server-5.5.mysql.upstart: Fix stop on to make sure mysql is
fully stopped before shutdown commences. (LP: #688541) Also simplify
start on as it is redundant.
* d/control: Depend on upstart version which has apparmor profile load
script to prevent failure on upgrade from lucid to precise.
(LP: #907465)
* d/apparmor-profile: need to allow /run since that is the true path
of /var/run files. (LP: #917542)
* d/control: mysql-server-5.5 has files in it that used to be owned
by libmysqlclient-dev, so it must break/replace it. (LP: #912487)
* d/rules, d/control: 5.5.20 Fixes segfault on tests with gcc 4.6,
change compiler back to system default.
* d/rules: Turn off embedded libedit/readline.(Closes: #659566)
-- Clint Byrum <clint at ubuntu.com> Tue, 14 Feb 2012 23:59:22 -0800
** Changed in: mysql-5.5 (Ubuntu Precise)
Status: Invalid => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to sysvinit in Ubuntu.
https://bugs.launchpad.net/bugs/688541
Title:
race condition on shutdown (leads to corrupted fs)
Status in “mysql-5.1” package in Ubuntu:
Invalid
Status in “mysql-5.5” package in Ubuntu:
Fix Released
Status in “sysvinit” package in Ubuntu:
Fix Released
Status in “mysql-5.1” source package in Oneiric:
Invalid
Status in “mysql-5.5” source package in Oneiric:
Invalid
Status in “sysvinit” source package in Oneiric:
Fix Released
Status in “mysql-5.1” source package in Precise:
Invalid
Status in “mysql-5.5” source package in Precise:
Fix Released
Status in “sysvinit” source package in Precise:
Fix Released
Bug description:
== SRU JUSTIFICATION ==
IMPACT: potential data loss or extension of downtime. MySQL, for
example, if sent a SIGKILL before it is done flushing its buffers into
MyISAM tables, will lose that data. If using InnoDB, the transactions
will have to be replayed from the transaction log at startup, which
can take far longer than completing the flush procedure which the 300
second kill timeout in its job file allows for.
TEST CASE:
1. create a script, /usr/local/bin/15seconds.py with this as the
content:
##### BEGIN COPY/PASTE #####
#!/usr/bin/python
import time
import signal
import logging
import sys
logging.basicConfig(level=logging.INFO,format="TEST: %(asctime)s:
%(message)s")
def shutdown_process(sig, frame):
logging.info("sleeping 15 seconds...")
time.sleep(15)
logging.info("now exitting...")
sys.exit(0)
signal.signal(signal.SIGTERM, shutdown_process)
logging.info("Entering infinite loop")
while True:
time.sleep(1)
##### END COPY/PASTE #####
chmod +x /usr/local/bin/15seconds.py
2. Create an upstart job file, /etc/init/15sec.conf to run this:
##### BEGIN COPY/PASTE #####
start on runlevel [2345]
stop on runlevel [016]
respawn
kill timeout 17
console output
exec /usr/local/bin/15seconds.py
##### END COPY/PASTE #####
3. sudo initctl start 15sec
4. sudo shutdown -h now
On an affected system, the job will be sent SIGKILL before the 15
second kill timeout, so your shutdown log will look something like
this:
Checking for running unattended-upgrades:
TEST: 2011-12-13 01:02:54,638: sleeping 15 seconds...
* Asking all remaining processes to terminate... TEST: 2011-12-13 01:02:54,818: now exitting...
TEST: 2011-12-13 01:02:54,819: sleeping 15 seconds...
[ OK ]
* Killing all remaining processes... [fail]
* Deconfiguring network interfaces... [ OK ]
* Deactivating swap... [ OK ]
* Will now halt
[ 68.020383] System halted.
An unaffected system will look like this:
Checking for running unattended-upgrades:
TEST: 2011-12-13 00:52:30,476: sleeping 15 seconds...
* Asking all remaining processes to terminate... [ OK ]
TEST: 2011-12-13 00:52:45,497: now exitting...
* All processes ended within 16 seconds.... [ OK ]
* Deconfiguring network interfaces... [ OK ]
* Deactivating swap... [ OK ]
* Will now halt
[ 356.481556] System halted.
Note that the 15sec job is waited on once the bug is fixed, where in
the unpatched version it is killed immediately.
DEV FIX: The sendsigs script has not been changed in precise other
than for this patch.
REGRESSION POTENTIAL: There may be scenarios and jobs that have very
high kill timeouts which will cause system shutdowns to wait for up to
300 seconds instead of the pervious 10. This is considered a good
balance between waiting long enough for any reasonable application to
flush its buffers and short enough that we won't run up against any
battery backup systems running out of battery power.
======
I'm using mysql-server-5.1 on a 10.04 LTS installation.
The mysql db is around 27GB and on a separate partition mounted as /var/lib/mysql.
On shutdown I get the following error message:
Checking for running unattended-upgrades: * Asking all remaining processes to terminate...
[80G
[74G[ OK ]
* All processes ended within 1 seconds....
[80G
[74G[ OK ]
* Deconfiguring network interfaces...
[80G
[74G[ OK ]
* Deactivating swap...
[80G
[74G[ OK ]
* Unmounting local filesystems...
[80G umount2: Device or resource busy
umount: /var/lib/mysql: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
umount2: Device or resource busy
umount: /tmp: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
umount2: Device or resource busy
[74G[
[31mfail
[39;49m]
mount: / is busy
* Will now restart
[ 3369.429751] Restarting system.
On the next reboot the file system is corrupt and need to be fsck-ed.
I think the problem is, that mysql uses an upstart job (/etc/init/mysql.conf) and has
stop on runlevel [016]
The rc.conf job is also triggered on runlevel 0 and 6, so they
basically run at the same time.As
When /etc/rc0.d/S20sendsigs is run, it deliberatly does not wait or
kill any upstart jobs.
As my mysqld process takes some time to shutdown, S40umountfs and
S60umountroot are run before the mysqld has quit.
Leading to the fs not being properly unmounted. It is event possible
that mysqld is forcefully killed by halt in S90halt if it hasn't
stopped by then.
This is a serious issue, as it can (and will) lead to data loss.
Other upstart jobs, like rsyslog.conf, use the same "stop on runlevel
[016]" stanza, so they are probably affected too.
ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: mysql-server-5.1 5.1.49-1ubuntu8.1
Uname: Linux 2.6.32-5-686 i686
NonfreeKernelModules: michael_mic arc4 ecb lib80211_crypt_tkip aes_i586 aes_generic lib80211_crypt_ccmp sco bnep rfcomm l2cap binfmt_misc acpi_cpufreq ppdev lp cpufreq_userspace cpufreq_stats vboxnetadp cpufreq_powersave vboxnetflt cpufreq_conservative vboxdrv fuse pcmcia snd_intel8x0m snd_intel8x0 snd_ac97_codec btusb bluetooth rfkill ac97_bus yenta_socket ipw2200 snd_pcm 8139too firewire_ohci snd_seq 8139cp firewire_core sg uhci_hcd snd_timer rsrc_nonstatic libipw snd_seq_device pcmcia_core crc_itu_t parport_pc smsc_ircc2 ehci_hcd mii joydev lib80211 sr_mod parport i2c_i801 irda snd usbcore wbsd soundcore shpchp mmc_core pcspkr rng_core cdrom psmouse container crc_ccitt snd_page_alloc serio_raw pci_hotplug ac battery nls_base processor evdev ppp_generic slhc loop autofs4 ext4 mbcache jbd2 crc16 dm_mod sd_mod crc_t10dif radeon ttm ata_generic drm_kms_helper ata_piix drm i2c_algo_bit libata video thermal i2c_core scsi_mod output thermal_sys button
Architecture: i386
Date: Fri Dec 10 13:41:52 2010
ProcEnviron:
PATH=(custom, no user)
LANG=de_DE.utf8
SHELL=/bin/bash
SourcePackage: mysql-5.1
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mysql-5.1/+bug/688541/+subscriptions
More information about the foundations-bugs
mailing list