[Bug 1900690] [NEW] [Ubuntu 20.04] ceph: messages, mds: Fix decoding of enum types on big-endian systems

Launchpad Bug Tracker 1900690 at bugs.launchpad.net
Tue Oct 20 11:19:50 UTC 2020


You have been subscribed to a public bug:

How to reproduce:

On the initial installation, Z cluster had 1 monitor node, 3 OSDs, 1 MDS and 1 MGR. Inorder to form a quorum, 2 more nodes have been added as monitor nodes which are OSDs already.
The Z cluster then had 3 monitor nodes of which 2 are both OSDs and Monitors.

However, at some point in time during the stress-ng run, the monitor
daemon crashed repeatedly on the cluster back to back. The crash stopped
only after removing both the monitor nodes which are OSDs from the
quorum and then the cluster remained stable.

Topology:

root at m8330013:~# ceph node ls all
{
    "mon": {
        "m8330013": [
            "m8330013"
        ],
       "m8330014": [
            "m8330014"
        ],
       "m8330015": [
            "m8330015"
        ]
    },
    "osd": {
        "m8330014": [
            0
        ],
        "m8330015": [
            1
        ],
        "m8330016": [
            2
        ]
    },
    "mds": {
        "m8330013": [
            "m8330013"
        ]
    },
    "mgr": {
        "m8330013": [
            "m8330013"
        ],
        "m8330015": [
            "m8330015"
        ]
    }
}
root at m8330013:~#

The below job file runs each filesystem stressor sequentially one per
CPU for 5 minutes and the shows the cumulative user and system time of
all the processes at the end of the stress run.

Stress-ng Job file :

run sequential
metrics
verbose
timeout 5m
times
timestamp

#0 means 1 stressor per CPU
access 0
bind-mount 0
chdir 0
chmod 0
chown 0
copy-file 0
dentry 0
dir 0
dirdeep 0
dnotify 0
dup 0
eventfd 0
fallocate 0
fanotify 0
fcntl 0
fiemap 0
file-ioctl 0
filename 0
flock 0
fstat 0
getdent 0
handle 0
inode-flags 0
inotify 0
io 0
iomix 0
ioprio 0
lease 0
link 0
locka 0
lockf 0
lockofd 0
mknod 0
open 0
procfs 0
rename 0
symlink 0
sync-file 0
utime 0
xattr 0

Command for Execution:

stress-ng --job <job_file> --temp-path <cephfs_mountpoint> --log-file
<log_file>

A proposed fixup sent to upstream:
https://github.com/ceph/ceph/pull/36697

As mentioned above, the fix for this issue landed upstream at PR:

https://github.com/ceph/ceph/pull/36697

which was backported to Octopus (15.2.x) release at PR:

https://github.com/ceph/ceph/pull/36813


This backported patch seems to be applied cleanly in ceph-15.2.3 at
focal-updates git tree at :

https://git.launchpad.net/ubuntu/+source/ceph/log/?h=applied/ubuntu
/focal-updates

Please apply the backported patch to this tree. Thanks.

Please be aware that upstream's backport patch
https://github.com/ceph/ceph/pull/36813 merged 2 patches in master
branch together:

https://github.com/ceph/ceph/pull/35920
https://github.com/ceph/ceph/pull/36697

which we need both.

** Affects: ceph (Ubuntu)
     Importance: Undecided
     Assignee: Skipper Bug Screeners (skipper-screen-team)
         Status: New


** Tags: architecture-s39064 bugnameltc-188070 severity-high targetmilestone-inin2004
-- 
[Ubuntu 20.04] ceph: messages,mds: Fix decoding of enum types on big-endian systems
https://bugs.launchpad.net/bugs/1900690
You received this bug notification because you are a member of Ubuntu OpenStack, which is subscribed to ceph in Ubuntu.



More information about the Ubuntu-openstack-bugs mailing list