[Bug 1969000] Re: [SRU] mon crashes when improper json is passed to rados

nikhil kshirsagar 1969000 at bugs.launchpad.net
Tue Sep 13 08:43:47 UTC 2022


This additional fix might also be needed -
https://github.com/ceph/ceph/pull/48044

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1969000

Title:
  [SRU] mon crashes when improper json is passed to rados

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive ussuri series:
  New
Status in ceph package in Ubuntu:
  New
Status in ceph source package in Focal:
  New
Status in ceph source package in Impish:
  New
Status in ceph source package in Jammy:
  New
Status in ceph source package in Kinetic:
  New

Bug description:
  [Impact]
  If improper json data is passed to rados using a manual curl command, or invalid json data through a script like the python eg. shown, it can end up crashing the mon.

  [Test Plan]
  Setup a ceph octopus cluster. A manual run of curl with malformed request like this results in the crash.

  curl -k -H "Authorization: Basic $TOKEN"
  "https://juju-3b3d82-10-lxd-0:8003/request" -X POST -d
  '{"prefix":"auth add","entity":"client.testuser02","caps":"mon
  '\''allow r'\'' osd '\''allow rw pool=testpool01'\''"}'

  The request status shows it is still in the queue if you check with
  curl -k -X GET "$endpoint/request"

  [
      {
          "failed": [],
          "finished": [],
          "has_failed": false,
          "id": "140576245092648",
          "is_finished": false,
          "is_waiting": false,
          "running": [
              {
                  "command": "auth add entity=client.testuser02 caps=mon 'allow r' osd 'allow rw pool=testpool01'",
                  "outb": "",
                  "outs": ""
              }
          ],
          "state": "pending",
          "waiting": []
      }
  ]

  This reproduces without restful API too.

  Use this python script to reproduce the issue. Run it on the mon node,

  root at juju-8c5f4a-sts-stein-bionic-0:/root# cat testcrashnorest.py
  #!/usr/bin/env python3
  import json
  import rados
  c = rados.Rados(conffile='/etc/ceph/ceph.conf')
  c.connect()
  cmd = json.dumps({"prefix":"auth add","entity":"client.testuser02","caps":"mon '\''allow r'\'' osd '\''allow rw pool=testpool01'\''"})
  print(c.mon_command(cmd, b''))

  root at juju-8c5f4a-sts-stein-bionic-0:/root# ceph -s
  cluster:
  id: 6123c916-a12a-11ec-bc02-fa163e9f86e0
  health: HEALTH_WARN
  mon is allowing insecure global_id reclaim
  1 monitors have not enabled msgr2
  Reduced data availability: 69 pgs inactive
  1921 daemons have recently crashed

  services:
  mon: 1 daemons, quorum juju-8c5f4a-sts-stein-bionic-0 (age 92s)
  mgr: juju-8c5f4a-sts-stein-bionic-0(active, since 22m)
  osd: 3 osds: 3 up (since 22h), 3 in

  data:
  pools: 4 pools, 69 pgs
  objects: 0 objects, 0 B
  usage: 0 B used, 0 B / 0 B avail
  pgs: 100.000% pgs unknown
  69 unknown

  root at juju-8c5f4a-sts-stein-bionic-0:/root# ./testcrashnorest.py

  ^C

  (note the script hangs)

  mon logs show - https://pastebin.com/Cuu9jkmu , the crash is seen, and
  then it seems like systemd restarts ceph, so ceph -s hangs for a while
  then we see the restart messages like.

  --- end dump of recent events ---
  2022-03-16T05:35:30.111+0000 7ffaf0e3b540 0 set uid:gid to 64045:64045 (ceph:ceph)
  2022-03-16T05:35:30.111+0000 7ffaf0e3b540 0 ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable), process ceph-mon, pid 490328
  2022-03-16T05:35:30.111+0000 7ffaf0e3b540 0 pidfile_write: ignore empty --pid-file
  2022-03-16T05:35:30.139+0000 7ffaf0e3b540 0 load: jerasure load: lrc load: isa
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option compression = kNoCompression
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option level_compaction_dynamic_level_bytes = true
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option write_buffer_size = 33554432
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option compression = kNoCompression
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option level_compaction_dynamic_level_bytes = true
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option write_buffer_size = 33554432
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 1 rocksdb: do_open column families: [default]
  2022-03-16T05:35:30.143+0000 7ffaf0e3b540 4 rocksdb: RocksDB version: 6.1.2

  [Where problems could occur]
  If there is malformed input data like the json in the python script,
  that causes exceptions in other parts of the code which are not caught
  by this fix, there could still be a termination or crash in those situations.

  [Other Info]
  Reported upstream at https://tracker.ceph.com/issues/54558 (including reproducer, and fix testing details) and fixed through https://github.com/ceph/ceph/pull/45547

  PR for Octopus is at https://github.com/ceph/ceph/pull/45891

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1969000/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list