[Bug 1713032] Re: [luminous] ceph-mon crashes when it is elected leader (s390x)

Tue Jan 22 14:39:17 UTC 2019

If you add the debug symbols repo as follows here:

https://wiki.ubuntu.com/Debug%20Symbol%20Packages#Manual_install_of_debug_packages

then the debug symbols should be available for install - it was
mentioned that they may be available for luminous but not mimic - in
this case, luminous can be used as the bug is exactly the same between
both versions

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1713032

Title:
  [luminous] ceph-mon crashes when it is elected leader (s390x)

Status in Ubuntu Cloud Archive:
  Triaged
Status in Ubuntu on IBM z Systems:
  In Progress
Status in ceph package in Ubuntu:
  In Progress

Bug description:
  ceph-mon - s390x - 1 of my 3 nodes decides it is the leader, then
  crashes:

  Summary:

  2017-08-25 10:30:49.764717 3ff9a7ff910  1 mon.juju-a9ec9d-1 at 0(electing).elector(105) init, last seen epoch 105
  2017-08-25 10:30:55.288336 3ff9a7ff910  0 log_channel(cluster) log [INF] : mon.juju-a9ec9d-1 at 0 won leader election with quorum 0,1
  2017-08-25 10:30:55.487872 3ff9a7ff910  0 log_channel(cluster) log [INF] : HEALTH_ERR; no osds; 1 mons down, quorum 0,1 juju-a9ec9d-1,juju-a9ec9d-0
  2017-08-25 10:30:56.047020 3ff8bfff910  0 log_channel(cluster) log [INF] : monmap e1: 3 mons at {juju-a9ec9d-0=10.0.8.105:6789/0,juju-a9ec9d-1=10.0.8.84:6789/0,noname-b=10.0.8.179:6789/0}
  2017-08-25 10:30:56.047050 3ff8bfff910  0 log_channel(cluster) log [INF] : pgmap 0 pgs: ; 0 bytes data, 0 kB used, 0 kB / 0 kB avail
  2017-08-25 10:30:56.047073 3ff8bfff910  0 log_channel(cluster) log [DBG] : fsmap 
  2017-08-25 10:30:56.047122 3ff8bfff910  1 mon.juju-a9ec9d-1 at 0(leader).osd e0 create_pending setting backfillfull_ratio = 0.9
  2017-08-25 10:30:56.047135 3ff8bfff910  1 mon.juju-a9ec9d-1 at 0(leader).osd e0 create_pending setting full_ratio = 0.95
  2017-08-25 10:30:56.047137 3ff8bfff910  1 mon.juju-a9ec9d-1 at 0(leader).osd e0 create_pending setting nearfull_ratio = 0.85
  2017-08-25 10:30:56.047288 3ff8bfff910  1 mon.juju-a9ec9d-1 at 0(leader).osd e0 encode_pending skipping prime_pg_temp; mapping job did not start
  2017-08-25 10:30:56.051808 3ff8bfff910 -1 *** Caught signal (Aborted) **
   in thread 3ff8bfff910 thread_name:ms_dispatch

   ceph version 12.1.2 (b661348f156f148d764b998b65b90451f096cb27) luminous (rc)
   1: (()+0x9334b4) [0x2aa0f9334b4]
   2: [0x3ff8bff9b66]
   3: (gsignal()+0x30) [0x3ffa16381b8]
   4: (abort()+0x14e) [0x3ffa1639726]
   5: (__gnu_cxx::__verbose_terminate_handler()+0x19c) [0x3ffa1a28e2c]
   6: (()+0xa6776) [0x3ffa1a26776]
   7: (()+0xa67d8) [0x3ffa1a267d8]
   8: (__cxa_rethrow()+0x64) [0x3ffa1a26adc]
   9: (CrushWrapper::decode(ceph::buffer::list::iterator&)+0xdc2) [0x2aa0f8b4d92]
   10: (OSDMap::decode(ceph::buffer::list::iterator&)+0x5c4) [0x2aa0f739d44]
   11: (OSDMap::decode(ceph::buffer::list&)+0x44) [0x2aa0f73c434]
   12: (OSDMap::apply_incremental(OSDMap::Incremental const&)+0x1782) [0x2aa0f73dbe2]
   13: (OSDMonitor::encode_pending(std::shared_ptr<MonitorDBStore::Transaction>)+0x212) [0x2aa0f55cf3a]
   14: (PaxosService::propose_pending()+0x2be) [0x2aa0f5214d6]
   15: (PaxosService::_active()+0x2be) [0x2aa0f521bbe]
   16: (Context::complete(int)+0x1e) [0x2aa0f3f1d86]
   17: (void finish_contexts<Context>(CephContext*, std::__cxx11::list<Context*, std::allocator<Context*> >&, int)+0x212) [0x2aa0f3fb0ea]
   18: (Paxos::finish_round()+0x194) [0x2aa0f51937c]
   19: (Paxos::handle_last(boost::intrusive_ptr<MonOpRequest>)+0xfb2) [0x2aa0f51a97a]
   20: (Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x2d4) [0x2aa0f51b2c4]
   21: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xf20) [0x2aa0f3e86b8]
   22: (Monitor::_ms_dispatch(Message*)+0x64e) [0x2aa0f3e91ae]
   23: (Monitor::ms_dispatch(Message*)+0x34) [0x2aa0f41919c]
   24: (DispatchQueue::entry()+0xf0c) [0x2aa0f8df744]
   25: (DispatchQueue::DispatchThread::entry()+0x18) [0x2aa0f6ed828]
   26: (()+0x7934) [0x3ffa1e87934]
   27: (()+0xedd1a) [0x3ffa16edd1a]
   NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  Full log:

  https://pastebin.canonical.com/196718/

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: ceph 12.1.2-0ubuntu2~cloud0 [origin: Canonical]
  ProcVersionSignature: Ubuntu 4.4.0-87.110-generic 4.4.73
  Uname: Linux 4.4.0-87-generic s390x
  NonfreeKernelModules: ebtable_broute vport_gre ip_gre gre ip_tunnel xt_CT xt_mac xt_physdev br_netfilter xt_set ip_set_hash_net ip_set nfnetlink xt_REDIRECT nf_nat_redirect nf_conntrack_ipv6 ip6table_mangle xt_nat xt_mark xt_connmark ip6table_raw iptable_raw xt_conntrack ipt_REJECT nf_reject_ipv4 ebtable_filter nbd openvswitch nf_defrag_ipv6 ebt_arp ebt_dnat ebt_ip scsi_transport_iscsi binfmt_misc veth ip6table_filter ip6_tables xt_CHECKSUM iptable_mangle xt_tcpudp ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_comment iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack zfs zunicode zcommon znvpair spl zavl zlib_deflate iptable_filter ip_tables ebt_snat ebtable_nat ebtables x_tables bridge 8021q garp mrp stp llc xfs libcrc32c dm_snapshot dm_bufio ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 qeth_l2 sha_common chsc_sch eadm_sch qeth ctcm ccwgroup fsm zfcp qdio scsi_transport_fc dasd_eckd_mod dasd_mod
  ApportVersion: 2.20.1-0ubuntu2.10
  Architecture: s390x
  CrashDB:
   {
                  "impl": "launchpad",
                  "project": "cloud-archive",
                  "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml",
               }
  Date: Fri Aug 25 11:17:36 2017
  SourcePackage: ceph
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1713032/+subscriptions