[Bug 1681410] Re: fstrim corrupts ocfs2 filesystems when clustered
Kyle O'Donnell
kyleo at 0b10.mx
Mon Apr 10 17:10:26 UTC 2017
It is one device.
We have 2 luns for 2 different ocfs2 filesystems mounted on all servers
(6) in the cluster. It is presented via fiber channel from our SAN.
I think the issue is that if you run fstrim from all servers which are
mounting the same ocfs2 filesystem at the same time, bad stuff happens.
We are using multipth:
WWPN-THINGEE-HERE dm-3 TEGILE,INTELLIFLASH
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:0:16 sdb 8:16 active ready running
| `- 1:0:0:16 sdf 8:80 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
|- 0:0:1:16 sdd 8:48 active ready running
`- 1:0:1:16 sdh 8:112 active ready running
WWPN-THINGEE-HERE dm-2 TEGILE,INTELLIFLASH
size=2.0T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| |- 0:0:1:15 sdc 8:32 active ready running
| `- 1:0:1:15 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=1 status=enabled
|- 0:0:0:15 sda 8:0 active ready running
`- 1:0:0:15 sde 8:64 active ready running
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/1681410
Title:
fstrim corrupts ocfs2 filesystems when clustered
Status in util-linux package in Ubuntu:
Incomplete
Bug description:
Recently upgraded from trusty to xenial and found that our ocfs2
filesystems, which are mounted across a number of nodes
simultaneously, would become corrupt on the weekend:
[Sun Apr 9 06:46:35 2017] OCFS2: ERROR (device dm-2): ocfs2_validate_gd_self: Group descriptor #516096 has bad signature
[Sun Apr 9 06:46:35 2017] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted.
[Sun Apr 9 06:46:35 2017] OCFS2: File system is now read-only.
[Sun Apr 9 06:46:35 2017] (fstrim,1080,8):ocfs2_trim_fs:7399 ERROR: status = -30
[Sun Apr 9 06:46:35 2017] OCFS2: ERROR (device dm-3): ocfs2_validate_gd_self: Group descriptor #516096 has bad signature
[Sun Apr 9 06:46:36 2017] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted.
[Sun Apr 9 06:46:36 2017] OCFS2: File system is now read-only.
[Sun Apr 9 06:46:36 2017] (fstrim,1080,10):ocfs2_trim_fs:7399 ERROR: status = -30
We found the cron.weekly job which is pretty close to the timing:
47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
# cat /etc/cron.weekly/fstrim
#!/bin/sh
# trim all mounted file systems which support it
/sbin/fstrim --all || true
We have disabled this job across our servers running clustered ocfs2 filesystems. I think either the utility or the cronjob should ignore ocfs2 (gfs too?) filesystems.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/util-linux/+bug/1681410/+subscriptions
More information about the foundations-bugs
mailing list