[Bug 622742] [NEW] Bacula Storage-daemon dies with segfault if a File-daemon can’t be contacted
Philipp Lindt
622742 at bugs.launchpad.net
Mon Aug 23 15:02:23 BST 2010
Public bug reported:
Binary package hint: bacula
Error message in daemon.log:
bacula-sd: Bacula interrupted by signal 11: Segmentation violation
Error messages given by the bacula-director:
Not accessible clients:
17-Aug 04:37 cxl05010-dir JobId 869: Warning: bsock.c:129 Could not connect to Client: CXW11010-fd on 192.168.11.10:9102. ERR=Connection timed out
Retrying ...
17-Aug 04:40 cxl05010-dir JobId 869: Fatal error: bsock.c:135 Unable to connect to Client: CXW11010-fd on 192.168.11.10:9102. ERR=Connection timed out
17-Aug 04:42 cxl05010-dir JobId 870: Warning: bsock.c:129 Could not connect to Client: CXW11011-fd on 192.168.11.11:9102. ERR=Connection timed out
Retrying ...
17-Aug 04:45 cxl05010-dir JobId 870: Fatal error: bsock.c:135 Unable to connect to Client: CXW11011-fd on 192.168.11.11:9102. ERR=Connection timed out
17-Aug 04:45 cxl05010-dir JobId 870: Error: openssl.c:86 TLS read/write failure.: ERR=error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac
Subsequent backup-jobs which failed:
17-Aug 04:45 cxl05010-dir JobId 0: Warning: bsock.c:129 Could not connect to Storage daemon on cxb24010.consultix.admin:9103. ERR=Connection refused (notice that the JobID is 0)
17-Aug 04:15 cxl05006-fd JobId 860: Fatal error: backup.c:892 Network
send error to SD. ERR=Broken pipe
17-Aug 04:50 cxl05010-dir JobId 871: Warning: bsock.c:129 Could not
connect to Storage daemon on cxb24010.consultix.admin:9103.
ERR=Connection refused
17-Aug 05:15 cxl05010-dir JobId 0: Fatal error: bsock.c:135 Unable to
connect to Storage daemon on cxb24010.consultix.admin:9103.
ERR=Connection refused
17-Aug 11:01 cxl05010-dir JobId 0: Fatal error: bsock.c:135 Unable to
connect to Storage daemon on cxb24010.consultix.admin:9103.
ERR=Connection refused
This error occurs if one (or more) File-daemons can’t be contacted by
the bacula-director.
The Storage-daemon dies when the job whose file-daemon is unreachable is canceled
by the director.
The biggest problem is that one client whose file-daemon cant’t be
reached will crash the storage-daemon and all the running and subsequent
backups will fail.
The segfault did occur several times, twice because the IP-address of the client did change, while
is wasn’t changed in the bacula-fd.conf and once when the client was shut-down.
In terms of performance there should be no problems as the raid-array
could handle a throughput to disk of 650Mbyte/s . The network-capacity
should be no problem either, as the storage-server got a 10gbit
connection, the clients a 1gbit one. Cpu- and memory load (on the
storage-server) is low too (about 10% of one cpu-core per backup-job,
about 900mb of ram-usage).
To make sure it is not a temporary issue, the bacula-director and the storage-daemon were restarted.
To make sure it is not an issue depending on the Number of concurrent jobs we tested it with three jobs at a time.
To make sure it is not a hardware related issue another storage-deamon system was installed and tested.
We could always re-produce the crash of the storage-daemon on other
hardware under similar conditions.
Additional Info:
ProblemType: Bug
Uname: 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:28:05 UTC 2010 x86_64
Architecture: x86_64
Package the bug was found in: bacula-sd (5.0.1-1ubuntu1)
SourcePackage: bacula
Release: Ubuntu 10.04 LTS
** Affects: bacula (Ubuntu)
Importance: Undecided
Status: New
** Tags: bacula bacula-sd segfault
--
Bacula Storage-daemon dies with segfault if a File-daemon can’t be contacted
https://bugs.launchpad.net/bugs/622742
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to bacula in ubuntu.
More information about the Ubuntu-server-bugs
mailing list