[Bug 1475247] Re: ceph-disk-prepare --zap-disk hang

Lirim lirim.osmani at canonical.com
Fri Oct 30 11:37:05 UTC 2015


Hello,

I wonder  if my problem still relates to the same issue. I was on-site
last week and deploying the CEPH cluster is failing Basically preparing
the disk is failing. I have to brute-force manually 2-3 times to work as
seen below.

1st time:

root at hp-3:~# ceph-disk zap /dev/sdk
Warning! Disk size is smaller than the main header indicates! Loading
secondary header from the last sector of the disk! You should use 'v' to
verify disk integrity, and perhaps options on the experts' menu to repair
the disk.
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
 
Warning! One or more CRCs don't match. You should repair the disk!
 
Invalid partition data!
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
ceph-disk: Error: Command '['/sbin/sgdisk', '--zap-all', '--', '/dev/sdk']' returned non-zero exit status 2
 

2nd time:

root at hp-3:~# ceph-disk zap /dev/sdk
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
root at hp-3:~# ceph-disk zap /dev/sdk
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

I even  downgraded to 3.13 kernel  and yet still the same. However when
trying on the lab or deploying ceph in a  vm environment I can't
reproduce the same error.

I'm back with the customer on-site next week and this appears to be the
major roadblock.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1475247

Title:
  ceph-disk-prepare --zap-disk hang

Status in ceph package in Ubuntu:
  Fix Released
Status in ceph source package in Trusty:
  Fix Committed
Status in ceph source package in Utopic:
  Won't Fix
Status in ceph source package in Vivid:
  Fix Committed
Status in ceph source package in Wily:
  Fix Released
Status in ceph package in Juju Charms Collection:
  Fix Released
Status in ceph-osd package in Juju Charms Collection:
  Fix Released

Bug description:
  [Impact]
  Disks with invalid metadata can cause hangs during cleaning; resulting in stuck deployments.

  [Test Case]
  Initialize a disk with invalid metadata using the '--zap-disk' option.

  [Regression Potential]
  Minimal; already in later Ubuntu releases.

  [Original Bug Report]
  During an Autopilot deployment on gMAAS, Juju had hung running a mon-relation-changed hook

  $ ps afxwww | grep -A 4 [m]on-relation-changed
    29118 ?        S      0:03  \_ /usr/bin/python /var/lib/juju/agents/unit-ceph-1/charm/hooks/mon-relation-changed
    37996 ?        S      0:00      \_ /bin/sh /usr/sbin/ceph-disk-prepare --fs-type xfs --zap-disk /dev/sdb
    37998 ?        S      0:00          \_ /usr/bin/python /usr/sbin/ceph-disk prepare --fs-type xfs --zap-disk /dev/sdb
    38016 ?        D      0:00              \_ /sbin/sgdisk --zap-all --clear --mbrtogpt -- /dev/sdb

  This had been in this state for > 10m. The logs[1] from the unit in
  question showed that something was up with the partition tables on
  that disk.

  I fixed this by hand using gdisk[2]

  [1] https://pastebin.canonical.com/135426/
  [2] http://paste.ubuntu.com/11887096/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1475247/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list