[Bug 1969000] Re: [SRU] mon crashes when improper json is passed to rados
nikhil kshirsagar
1969000 at bugs.launchpad.net
Tue Sep 13 08:43:47 UTC 2022
This additional fix might also be needed -
https://github.com/ceph/ceph/pull/48044
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1969000
Title:
[SRU] mon crashes when improper json is passed to rados
Status in Ubuntu Cloud Archive:
New
Status in Ubuntu Cloud Archive ussuri series:
New
Status in ceph package in Ubuntu:
New
Status in ceph source package in Focal:
New
Status in ceph source package in Impish:
New
Status in ceph source package in Jammy:
New
Status in ceph source package in Kinetic:
New
Bug description:
[Impact]
If improper json data is passed to rados using a manual curl command, or invalid json data through a script like the python eg. shown, it can end up crashing the mon.
[Test Plan]
Setup a ceph octopus cluster. A manual run of curl with malformed request like this results in the crash.
curl -k -H "Authorization: Basic $TOKEN"
"https://juju-3b3d82-10-lxd-0:8003/request" -X POST -d
'{"prefix":"auth add","entity":"client.testuser02","caps":"mon
'\''allow r'\'' osd '\''allow rw pool=testpool01'\''"}'
The request status shows it is still in the queue if you check with
curl -k -X GET "$endpoint/request"
[
{
"failed": [],
"finished": [],
"has_failed": false,
"id": "140576245092648",
"is_finished": false,
"is_waiting": false,
"running": [
{
"command": "auth add entity=client.testuser02 caps=mon 'allow r' osd 'allow rw pool=testpool01'",
"outb": "",
"outs": ""
}
],
"state": "pending",
"waiting": []
}
]
This reproduces without restful API too.
Use this python script to reproduce the issue. Run it on the mon node,
root at juju-8c5f4a-sts-stein-bionic-0:/root# cat testcrashnorest.py
#!/usr/bin/env python3
import json
import rados
c = rados.Rados(conffile='/etc/ceph/ceph.conf')
c.connect()
cmd = json.dumps({"prefix":"auth add","entity":"client.testuser02","caps":"mon '\''allow r'\'' osd '\''allow rw pool=testpool01'\''"})
print(c.mon_command(cmd, b''))
root at juju-8c5f4a-sts-stein-bionic-0:/root# ceph -s
cluster:
id: 6123c916-a12a-11ec-bc02-fa163e9f86e0
health: HEALTH_WARN
mon is allowing insecure global_id reclaim
1 monitors have not enabled msgr2
Reduced data availability: 69 pgs inactive
1921 daemons have recently crashed
services:
mon: 1 daemons, quorum juju-8c5f4a-sts-stein-bionic-0 (age 92s)
mgr: juju-8c5f4a-sts-stein-bionic-0(active, since 22m)
osd: 3 osds: 3 up (since 22h), 3 in
data:
pools: 4 pools, 69 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs: 100.000% pgs unknown
69 unknown
root at juju-8c5f4a-sts-stein-bionic-0:/root# ./testcrashnorest.py
^C
(note the script hangs)
mon logs show - https://pastebin.com/Cuu9jkmu , the crash is seen, and
then it seems like systemd restarts ceph, so ceph -s hangs for a while
then we see the restart messages like.
--- end dump of recent events ---
2022-03-16T05:35:30.111+0000 7ffaf0e3b540 0 set uid:gid to 64045:64045 (ceph:ceph)
2022-03-16T05:35:30.111+0000 7ffaf0e3b540 0 ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable), process ceph-mon, pid 490328
2022-03-16T05:35:30.111+0000 7ffaf0e3b540 0 pidfile_write: ignore empty --pid-file
2022-03-16T05:35:30.139+0000 7ffaf0e3b540 0 load: jerasure load: lrc load: isa
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option compression = kNoCompression
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option level_compaction_dynamic_level_bytes = true
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option write_buffer_size = 33554432
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option compression = kNoCompression
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option level_compaction_dynamic_level_bytes = true
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 0 set rocksdb option write_buffer_size = 33554432
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 1 rocksdb: do_open column families: [default]
2022-03-16T05:35:30.143+0000 7ffaf0e3b540 4 rocksdb: RocksDB version: 6.1.2
[Where problems could occur]
If there is malformed input data like the json in the python script,
that causes exceptions in other parts of the code which are not caught
by this fix, there could still be a termination or crash in those situations.
[Other Info]
Reported upstream at https://tracker.ceph.com/issues/54558 (including reproducer, and fix testing details) and fixed through https://github.com/ceph/ceph/pull/45547
PR for Octopus is at https://github.com/ceph/ceph/pull/45891
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1969000/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list