Bus error for long-running calculations on remote ZFS file system (?)

Florian Oppermann florian.oppermann at itp.uni-hannover.de
Mon Apr 4 10:54:23 UTC 2016


Dear Ubuntu users,

We’re using a file server with ZFS to provide storage for numerical
calculations via the internal ZFS NFS server. Our calculations run on
user workstations.

Recently we’ve found that long-running calculations (40+ hours)
sometimes end with bus errors.

> Program received signal SIGBUS: Access to an undefined portion of a memory object.

We tried to run a program 10 times on our usual (ZFS) NFS mount, and 10
times on a “normal” (with underlying ext4 file system) NFS mount – the
bus errors occured only in the ZFS case.

As this also happened with completely different programs and it doesn’t
happen always we don’t believe that the actual program code causes the
errors. That’s why we accuse the ZFS.

On the other hand the programs don’t necessarily open / access files at
crash time. Everything should be in memory, maybe except a pipe to a
local logfile (in /tmp). The file system *shouldn’t matter*. (Right?)

Unfortunately the problem is not reproducible with reduced computation
time, which makes it really hard to trace.

Is there any known problems with ZFS mounts and programs running for a
long time?

Best regards,
Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4980 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20160404/aff580b4/attachment.bin>


More information about the ubuntu-users mailing list