[Bug 1384062] Re: os-prober kills ceph OSD

Launchpad Bug Tracker 1384062 at bugs.launchpad.net
Tue Feb 16 15:22:34 UTC 2016


Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: ceph (Ubuntu)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to ceph in Ubuntu.
https://bugs.launchpad.net/bugs/1384062

Title:
  os-prober kills ceph OSD

Status in ceph package in Ubuntu:
  Confirmed
Status in ceph package in Juju Charms Collection:
  Invalid

Bug description:
  This morning automatic package upgrade on our running system:

  libsigc++-2.0-0c2a,libssl1.0.0,man-db,libgtk2.0-common,libgtk2.0-bin,
  libgtk2.0-0,openssh-sftp-server,openssh-server,
  openssh-client,grub-pc,grub-pc-bin,grub2-common,grub-common,openssl,python-cryptography,python-pygraphviz

  
  killed five OSD out of 15 on our  ceph 0.80.6 cluster of 5 machines :

  root at g2:/var/log/ceph# grep -E ^2014-10-22 ceph-osd.7.log
  2014-10-22 07:41:36.783358 7f4d33d55700 -1 journal FileJournal::write_bl : write_fd failed: (1) Operation not permitted
  2014-10-22 07:41:36.783617 7f4d33d55700 -1 journal FileJournal::do_write: write_bl(pos=793935872) failed
  2014-10-22 07:41:36.800201 7f4d33d55700 -1 os/FileJournal.cc: In function 'void FileJournal::do_write(ceph::bufferlist&)' thread 7f4d33d55700 time 2014-10-22 07:41:36.783629
  2014-10-22 07:41:36.847389 7f4d33d55700 -1 *** Caught signal (Aborted) **

  root at n7:/var/log/ceph# grep -E ^2014-10-22 ceph-osd.10.log|cut -c-120
  2014-10-22 07:42:18.169142 7f9b977df700 -1 journal FileJournal::write_bl : write_fd failed: (1) Operation not permitted

  root at n7:/var/log/ceph# grep -E ^2014-10-22 ceph-osd.9.log|cut -c-120
  2014-10-22 07:42:17.509579 7f6efa27b700 -1 osd.9 15390 heartbeat_check: no reply from osd.13 since back 2014-10-22 07:41
  2014-10-22 07:42:17.509593 7f6efa27b700 -1 osd.9 15390 heartbeat_check: no reply from osd.14 since back 2014-10-22 07:41
  2014-10-22 07:42:17.945433 7f6ef6a74700 -1 journal FileJournal::do_write: pwrite(fd=23, hbp.length=4096) failed :(1) Ope
  2014-10-22 07:42:17.960678 7f6ef6a74700 -1 os/FileJournal.cc: In function 'void FileJournal::do_write(ceph::bufferlist&)

  root at stri:/var/log/ceph# grep -E ^2014-10-22 ceph-osd.13.log
  2014-10-22 00:42:01.140574 7fa929b8a700 -1 journal FileJournal::write_bl : write_fd failed: (1) Operation not permitted
  2014-10-22 00:42:01.141439 7fa929b8a700 -1 journal FileJournal::do_write: write_bl(pos=3496448000) failed

  root at stri:/var/log/ceph# grep -E ^2014-10-22 ceph-osd.14.log
  2014-10-22 00:41:54.828719 7f438eb45700 -1 osd.14 15388 heartbeat_check: no reply from osd.7 since back 2014-10-22 00:41:34.499777 front 2014-10-22 00:41:34.499777 (cutoff 2014-10-22 00:41:34.828717)
  2014-10-22 00:41:55.241586 7f437217f700  0 -- 192.168.99.246:6811/17136 >> 192.168.99.253:6806/25800 pipe(0x7f439f5fd900 sd=182 :6811 s=0 pgs=0 cs=0 l=0 c=0x7f43a71f1180).accept connect_seq 34 vs existing 33 state standby
  2014-10-22 00:42:01.235014 7f438b33e700 -1 journal FileJournal::write_bl : write_fd failed: (1) Operation not permitted
  2014-10-22 00:42:01.235032 7f438b33e700 -1 journal FileJournal::do_write: write_bl(pos=4626878464) failed

  The OSD all died just after a run of os-prober according to the logs:

  Oct 22 07:41:36 g2 os-prober: debug: running /usr/lib/os-
  probes/mounted/05efi on mounted /dev/sda1

  os-prober likely did an operation on the journal partition causing the
  write errors on the OSD.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1384062/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list