[MERGE] Delete faulty test that can leak a subprocess and/or hang the test suite.

John Arbash Meinel john at arbash-meinel.com
Thu Mar 19 18:21:26 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
> Encountered while experimenting with the parallelized test suite,
> where a process with command :
> 
>  bzr serve --no-plugins --port localhost:0
> 
> was left around causing zombies and hangs.
> 
> I couldn't reproduce the problem on demand which made it hard to
> debug, but yet, once out of 3 runs on average is not that bad[1] :-)
> 
> It turns out the test deleted by this patch (test_breakin) was
> not really different than the following one
> (test_breakin_harder), so I just copy the missing bits and get
> rid of the offending one.
> 

So I'm not sure that "test_breakin_harder" really ends up testing the
right thing.

Specifically, if we didn't have a SIGQUIT handler, wouldn't it still
pass? So removing "test_breakin" means that if we completely removed ^|
functionality (by accident, presumably) then the test wouldn't catch it.

I may be wrong, but the code seems to just keep calling os.kill(child,
SIGQUIT) until the process exits with SIGQUIT. It *doesn't* have a check
that the first SIGQUIT actually did anything.

Now I see that you preserved the "err = proc.stderr.readline()" and
checking that it says "entering debugger". What if instead of sending
the 'q\n' message *before* we readline, we send it *after* the readline?
Wouldn't that be enough of a sync to ensure that it doesn't hang?

Certainly I think the important bit is using os.waitpid() until the
child really does clean up (since that prevents it from becoming a zombie.)

So maybe something like:

=== modified file 'bzrlib/tests/blackbox/test_breakin.py'
- --- bzrlib/tests/blackbox/test_breakin.py       2009-02-23 15:29:35 +0000
+++ bzrlib/tests/blackbox/test_breakin.py       2009-03-19 18:20:04 +0000
@@ -57,9 +57,18 @@
         proc.stderr.readline()
         # first sigquit pops into debugger
         os.kill(proc.pid, signal.SIGQUIT)
+        err = proc.stderr.readline()
+        time.sleep(.1)
         proc.stdin.write("q\n")
- -        time.sleep(.5)
- -        err = proc.stderr.readline()
+        # Now wait for the child to die
+        for i in range(100):
+            time.sleep(0.1)
+            r = os.waitpid(proc.pid, os.WNOHANG)
+            if r != (0, 0):
+                # The process exited
+                break
+        else:
+            self.fail("subprocess didn't die from 'quit' in debugger")
         self.assertContainsRe(err, r'entering debugger')

     def test_breakin_harder(self):

Now, I suppose this suffers if stderr is buffered, since the child may
send the message, but it is sitting in the outbound pipe since the child
isn't dying or sending enough info to trigger a flush. But the way *you*
changed the code would suffer the same bug.

BB:tweak

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknCjSYACgkQJdeBCYSNAANnNgCgi9uX376Ix/thm4k9Gb1DN23H
elUAoJRvf64nmki5cdWEPBM7hbqiPFS4
=BnOD
-----END PGP SIGNATURE-----



More information about the bazaar mailing list