[Bug 71567] LVM Snapshot removal causes intermittent kernel panic
acutler
acutler at orchestrate.it
Wed Nov 15 01:41:41 UTC 2006
Public bug reported:
Binary package hint: lvm2
We have a script that automates the creation and removal of a LVM
snapshots on our VMware servers. Three times now we have had machines go
down when the snapshot was removed. I have logs showing the machine
going down immediately after the snapshot removal script has fired
(triggered and logged by our backup software).
This has occurred on both an HP Pavilion Desktop (uniprocessor, single
disk) and a Sun Sunfire V60X (SMP, md raid1).
The crash leaves the LV in an inconsistent state with device nodes and
snapshot names completely out of sync. On all occasions I have been able
to recover the volume by following the steps below:
Ubuntu 6.06 LTS, LVM Hard Crash repair
--------------------------------------
observe kernel oops.
perform hard reset.
machine comes back up with md2 array dirty, starting background reconstruction.
fails on mounting partitions,boots to single.
login at console.
mount /usr
vi /etc/fstab
comment out snapshotted lvm partition (/vmware)
exit. System boots to multi user.
open ssh shell to system
**** some info before we begin
root at anvil:~# lvscan
ACTIVE '/dev/vg_sys/lv_tmp' [4.00 GB] inherit
ACTIVE '/dev/vg_sys/lv_swap' [4.00 GB] inherit
ACTIVE '/dev/vg_sys/lv_var' [1.00 GB] inherit
ACTIVE '/dev/vg_sys/lv_usr' [1.00 GB] inherit
inactive Original '/dev/vg_sys/lv_vmware' [52.00 GB] contiguous
inactive Snapshot '/dev/vg_sys/lv_vmware_snap' [5.00 GB] inherit
root at anvil:~# pvscan
PV /dev/md2 VG vg_sys lvm2 [67.33 GB / 340.00 MB free]
Total: 1 [67.33 GB] / in use: 1 [67.33 GB] / in no VG: 0 [0 ]
root at anvil:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[1]
70605568 blocks [2/2] [UU]
[=====>...............] resync = 26.8% (18967232/70605568) finish=18.0min speed=47688K/sec
md1 : active raid1 sda2[0] sdb2[1]
979840 blocks [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
96256 blocks [2/2] [UU]
unused devices: <none>
root at anvil:~# uname -a
Linux anvil 2.6.15-27-server #1 SMP Sat Sep 16 02:57:21 UTC 2006 i686 GNU/Linux
root at anvil:~# ls /dev/mapper/
control vg_sys-lv_swap vg_sys-lv_tmp vg_sys-lv_usr vg_sys-lv_var vg_sys-lv_vmware vg_sys-lv_vmware-real
root at anvil:~# ls /dev/vg_sys/
lv_swap lv_tmp lv_usr lv_var
** lets repair the system
* create some missing device nodes
root at anvil:~# vgmknodes
* fix up the device mapper mess
root at anvil:~# mv /dev/mapper/vg_sys-lv_vmware /dev/mapper/vg_sys-lv_vmware_snap
root at anvil:~# mv /dev/mapper/vg_sys-lv_vmware-real /dev/mapper/vg_sys-lv_vmware
* check that our fs still exists
root at anvil:~# fsck /dev/mapper/vg_sys-lv_vmware
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/mapper/vg_sys-lv_vmware: recovering journal
/dev/mapper/vg_sys-lv_vmware: clean, 75/6815744 files, 10826547/13631488 blocks
* remove the snapshot
root at anvil:~# lvremove /dev/vg_sys/lv_vmware_snap
Logical volume "lv_vmware_snap" successfully removed
* renable vmware lvm partition
root at anvil:~# vi /etc/fstab
root at anvil:~# touch /forcefsck
root at anvil:~# reboot
** system fscks, and boots normally.
** Affects: linux-source-2.6.15 (Ubuntu)
Importance: Undecided
Status: Needs Info
--
LVM Snapshot removal causes intermittent kernel panic
https://launchpad.net/bugs/71567
More information about the kernel-bugs
mailing list