Worsening of selftest thread leaking
Martin (gzlist)
gzlist at googlemail.com
Wed May 19 09:24:24 BST 2010
For those not familiar with the threading problems with
selftest[bug392127] the root cause is essentially that a number of the
client-server tests spin up a new thread and don't rejoin it. These
generally finish up of their own accord, or quietly block till process
termination, and can cause some knock-on problems[bug531746] but are
generally harmless provided you have the ram for their stacks.
However, they've gone from being an annoyance (issue seen by other
people) to a real problem (issue seen by *me*!) causing random
failures, hangs, and crazy memory usage.
To try and work out when my box started having troubles I ran the
suite on a bunch of different past versions of Bazaar. There are a few
problems with doing this. Need to run the whole thing to get the ill
effects, which that takes a while. Also the symptoms vary, and not for
any particular reason. So though I've recorded some tangible
differences, as Vincent pointed out to me, this may not actually help
track down a specific problem.
The results for a recent revision[r5235] record a bunch of leaks, and
a random hang late on. Some runs[r5200] finish, but record loads of
"can't start new thread" and OOM failures.
Comparing with my known-good version[r4919], there are actually a
similar number of leaking tests, but vastly different memory usage.
Testing the next revision is problematic, as it crashes[r4920] due to
the introduction of testtools changing the teardown
semantics[testtools].
However, the next working revision[r4937] is similar to the current
one, high memory usage, and a hang. The next ten or so revisions I
tried reliably deadlock in bt.test_remote.TestStacking as well, but
none of them have the large numbers of thread-related failures
preceding that current versions have. To narrow the range down, I did
a version of the testtools introduction change with the crash fix
backported[r4920+4936..4937] which is much the same.
So, while I can't say what exactly, *something* is perhaps the fault
of testtools. As Vincent's workaround of segmenting the tests is no
longer sufficient to avoid these problems on the windows buildbot,
need to at least getting back to the old level of brokenness. I have
the detailed results if anyone is interested, and am happy to try
experiments.
Martin
[bug392127]: selftest fails with "can't start new thread"
<https://bugs.launchpad.net/bzr/+bug/392127>
[bug531746]: Intermittent test failure during _finishLogFile
<https://bugs.launchpad.net/bzr/+bug/531746>
[testtools]: Questions after testtools merge
<https://lists.ubuntu.com/archives/bazaar/2009q4/065580.html>
[r4919]: Results of bzr selftest for r4919 from 2009-12-22
22593 tests run in total, of which:
19393 Passed without problems
805 Parameters of test do not apply
1039 Lacking required feature to run test
1310 Skipped for another reason
29 Known to fail a particular assertion
6 Failed a given assertion
11 Raised an unexpected exception
Also 1562 tests leaked threads.
Time real 4676.9688 seconds
user 2970.3594 seconds
sys 1086.0938 seconds
Working set 177070080 bytes
Pagefile 207454208 bytes
[r4920]: Results of bzr selftest for r4920 from 2009-12-23
1142 tests run in total, of which:
965 Passed without problems
4 Parameters of test do not apply
30 Lacking required feature to run test
138 Skipped for another reason
2 Known to fail a particular assertion
3 Raised an unexpected exception
Also 21 tests leaked threads.
CRASH: Access violation
after bb.test_version.TestVersionUnicodeOutput.test_unicode_bzr_home
Time real 449.9219 seconds
user 255.5938 seconds
sys 140.3438 seconds
Working set 111452160 bytes
Pagefile 117743616 bytes
[r4920+4936..4937]: Results of bzr selftest for r4921 from 2010-05-18
20780 tests run in total, of which:
17651 Passed without problems
804 Parameters of test do not apply
982 Lacking required feature to run test
1304 Skipped for another reason
23 Known to fail a particular assertion
4 Failed a given assertion
12 Raised an unexpected exception
Also 1583 tests leaked threads.
HANG: Possible deadlock
after bt.test_remote.TestStacking.test_stacked_get_stream_groupcompress
Time real 5772.5938 seconds
user 3822.8438 seconds
sys 1084.1094 seconds
Working set 500199424 bytes
Pagefile 547721216 bytes
[r4937]: Results of bzr selftest for r4937 from 2010-01-07
20812 tests run in total, of which:
17685 Passed without problems
804 Parameters of test do not apply
982 Lacking required feature to run test
1304 Skipped for another reason
23 Known to fail a particular assertion
3 Failed a given assertion
11 Raised an unexpected exception
Also 1660 tests leaked threads.
HANG: Possible deadlock
after bt.test_remote.TestStacking.test_stacked_get_stream_topological
Time real 5449.9375 seconds
user 3681.0938 seconds
sys 1052.7813 seconds
Working set 501645312 bytes
Pagefile 549441536 bytes
[r5200]: Results of bzr selftest for r5200 from 2010-05-03
23166 tests run in total, of which:
19735 Passed without problems
829 Parameters of test do not apply
1054 Lacking required feature to run test
1323 Skipped for another reason
34 Known to fail a particular assertion
10 Failed a given assertion
181 Raised an unexpected exception
Also 1770 tests leaked threads.
Time real 3444.0469 seconds
user 2169.2813 seconds
sys 509.2500 seconds
Working set 557449216 bytes
Pagefile 613650432 bytes
[r5235]: Results of bzr selftest for r5235 from 2010-05-14
20432 tests run in total, of which:
17278 Passed without problems
826 Parameters of test do not apply
971 Lacking required feature to run test
1317 Skipped for another reason
22 Known to fail a particular assertion
3 Failed a given assertion
15 Raised an unexpected exception
Also 1673 tests leaked threads.
HANG: Possible deadlock
after bt.test_lockable_files.TestLockableFiles_LockDir.test_unlock_after_lock_write_with_token
Time real 3085.7813 seconds
user 1901.3906 seconds
sys 454.5938 seconds
Working set 507011072 bytes
Pagefile 563527680 bytes
More information about the bazaar
mailing list