[Bug 71567] LVM Snapshot removal causes intermittent kernel panic

acutler acutler at orchestrate.it
Wed Nov 15 01:41:41 UTC 2006


Public bug reported:

Binary package hint: lvm2

We have a script that automates the creation and removal of a LVM
snapshots on our VMware servers. Three times now we have had machines go
down when the snapshot was removed. I have logs showing the machine
going down immediately after the snapshot removal script has fired
(triggered and logged by our backup software).

This has occurred on both an HP Pavilion Desktop (uniprocessor, single
disk) and a Sun Sunfire V60X (SMP, md raid1).

The crash leaves the LV in an inconsistent state with device nodes and
snapshot names completely out of sync. On all occasions I have been able
to recover the volume by following the steps below:

Ubuntu 6.06 LTS, LVM Hard Crash repair
--------------------------------------

observe kernel oops.
perform hard reset.
machine comes back up with md2 array dirty, starting background reconstruction.
fails on mounting partitions,boots to single.
login at console.
mount /usr
vi /etc/fstab
comment out snapshotted lvm partition (/vmware)
exit. System boots to multi user.
open ssh shell to system

**** some info before we begin

root at anvil:~# lvscan
  ACTIVE            '/dev/vg_sys/lv_tmp' [4.00 GB] inherit
  ACTIVE            '/dev/vg_sys/lv_swap' [4.00 GB] inherit
  ACTIVE            '/dev/vg_sys/lv_var' [1.00 GB] inherit
  ACTIVE            '/dev/vg_sys/lv_usr' [1.00 GB] inherit
  inactive Original '/dev/vg_sys/lv_vmware' [52.00 GB] contiguous
  inactive Snapshot '/dev/vg_sys/lv_vmware_snap' [5.00 GB] inherit

root at anvil:~# pvscan
  PV /dev/md2   VG vg_sys   lvm2 [67.33 GB / 340.00 MB free]
  Total: 1 [67.33 GB] / in use: 1 [67.33 GB] / in no VG: 0 [0   ]

root at anvil:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[1]
      70605568 blocks [2/2] [UU]
      [=====>...............]  resync = 26.8% (18967232/70605568) finish=18.0min speed=47688K/sec

md1 : active raid1 sda2[0] sdb2[1]
      979840 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      96256 blocks [2/2] [UU]

unused devices: <none>

root at anvil:~# uname -a
Linux anvil 2.6.15-27-server #1 SMP Sat Sep 16 02:57:21 UTC 2006 i686 GNU/Linux

root at anvil:~# ls /dev/mapper/
control  vg_sys-lv_swap  vg_sys-lv_tmp  vg_sys-lv_usr  vg_sys-lv_var  vg_sys-lv_vmware  vg_sys-lv_vmware-real

root at anvil:~# ls /dev/vg_sys/
lv_swap  lv_tmp  lv_usr  lv_var

** lets repair the system

* create some missing device nodes
root at anvil:~# vgmknodes

* fix up the device mapper mess
root at anvil:~# mv /dev/mapper/vg_sys-lv_vmware /dev/mapper/vg_sys-lv_vmware_snap 
root at anvil:~# mv /dev/mapper/vg_sys-lv_vmware-real /dev/mapper/vg_sys-lv_vmware

* check that our fs still exists
root at anvil:~# fsck /dev/mapper/vg_sys-lv_vmware
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/mapper/vg_sys-lv_vmware: recovering journal
/dev/mapper/vg_sys-lv_vmware: clean, 75/6815744 files, 10826547/13631488 blocks

* remove the snapshot
root at anvil:~# lvremove /dev/vg_sys/lv_vmware_snap
  Logical volume "lv_vmware_snap" successfully removed

* renable vmware lvm partition
root at anvil:~# vi /etc/fstab
root at anvil:~# touch /forcefsck
root at anvil:~# reboot
** system fscks, and boots normally.

** Affects: linux-source-2.6.15 (Ubuntu)
     Importance: Undecided
         Status: Needs Info

-- 
LVM Snapshot removal causes intermittent kernel panic
https://launchpad.net/bugs/71567




More information about the kernel-bugs mailing list