[Bug 1475247] Re: ceph-disk-prepare --zap-disk hang
Andreas Hasenack
andreas at canonical.com
Wed Sep 2 19:04:51 UTC 2015
I still see sgdisk hanging in D state. The command line is correct now:
3642 ? Ssl 0:00 /var/lib/juju/tools/unit-ceph-1/jujud unit --data-dir /var/lib/juju --unit-name ceph/1 --debug
23810 ? S 0:02 \_ /usr/bin/python /var/lib/juju/agents/unit-ceph-1/charm/hooks/mon-relation-changed
27816 ? S 0:00 \_ /bin/sh /usr/sbin/ceph-disk-prepare --fs-type xfs --zap-disk /dev/sdb
27818 ? S 0:00 \_ /usr/bin/python /usr/sbin/ceph-disk prepare --fs-type xfs --zap-disk /dev/sdb
27833 ? D 0:00 \_ /sbin/sgdisk --zap-all -- /dev/sdb
The unit log is stuck in:
2015-09-02 18:57:54 INFO mon-relation-changed Warning! One or more CRCs don't match. You should repair the disk!
2015-09-02 18:57:54 INFO mon-relation-changed
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1475247
Title:
ceph-disk-prepare --zap-disk hang
Status in ceph package in Ubuntu:
Fix Released
Status in ceph source package in Trusty:
Fix Committed
Status in ceph source package in Utopic:
Won't Fix
Status in ceph source package in Vivid:
Fix Released
Status in ceph source package in Wily:
Fix Released
Status in ceph package in Juju Charms Collection:
Fix Released
Status in ceph-osd package in Juju Charms Collection:
Fix Released
Bug description:
[Impact]
Disks with invalid metadata can cause hangs during cleaning; resulting in stuck deployments.
[Test Case]
Initialize a disk with invalid metadata using the '--zap-disk' option.
[Regression Potential]
Minimal; already in later Ubuntu releases.
[Original Bug Report]
During an Autopilot deployment on gMAAS, Juju had hung running a mon-relation-changed hook
$ ps afxwww | grep -A 4 [m]on-relation-changed
29118 ? S 0:03 \_ /usr/bin/python /var/lib/juju/agents/unit-ceph-1/charm/hooks/mon-relation-changed
37996 ? S 0:00 \_ /bin/sh /usr/sbin/ceph-disk-prepare --fs-type xfs --zap-disk /dev/sdb
37998 ? S 0:00 \_ /usr/bin/python /usr/sbin/ceph-disk prepare --fs-type xfs --zap-disk /dev/sdb
38016 ? D 0:00 \_ /sbin/sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb
This had been in this state for > 10m. The logs[1] from the unit in
question showed that something was up with the partition tables on
that disk.
I fixed this by hand using gdisk[2]
[1] https://pastebin.canonical.com/135426/
[2] http://paste.ubuntu.com/11887096/
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1475247/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list