AW: AW: AW: AW: bzr selftest (on solaris 10): too may open files

Vincent Ladeuil v.ladeuil+lp at free.fr
Wed Nov 19 18:34:36 GMT 2008


>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:

    jam> ...

    >> *and* this is also true with python-2.5.2.
    >> 
    >> Interrupting the selftest and using pfiles <pid> reveals that in
    >> fact the open files are sockets...
    >> 
    >> I have yet to understand why this happens on Solaris and not on
    >> Linux but it means that only selftest is concerned by the problem
    >> and not should not have consequences when using bzr itself.
    >> 
    >> I'll run the test suite by parts to check which tests are really
    >> failing and keep you informed, but the important result is that
    >> you should be safe using bzr.
    >> 
    >> Vincent
    >> 
    >> 

    jam> I would guess that this is the "spawned threads" not
    jam> getting cleaned up quickly. (Run the test suite on other
    jam> platforms and it will tell you that we leaked XX
    jam> threads.)

It says so there too :)

    jam> This is generally caused by any Remote test, because it
    jam> spawns a smart server in the second thread, which waits
    jam> on a socket to respond to user requests. And as we don't
    jam> have an explicit "close()" for remote connections, the
    jam> service tends to stay around for a while.

Same goes for http tests and may be some others.

    jam> So one possible fix would be to add a timeout to the
    jam> socket, and if there hasn't been a request for XX
    jam> seconds, go ahead and shutdown cleanly. Also we can have
    jam> the test suite itself notice that the test has finished
    jam> and poke at the thread to tell it to shut down.

    jam> As I understand it, this causes problems on Windows
    jam> because you can't mix a socket with a timeout with the
    jam> file-like wrappers on a socket.  Also, adding a
    jam> "timeout" on a socket on windows is actually just
    jam> setting O_NDELAY which will raise an exception if the
    jam> socket request *would have* blocked.

Not only on windows, the python doc says you shouldn't mix the
two (file-like and timeout) and I fix a couple of bugs around
that.

I also tried at one point to force the socket shutdown but
stopped doing it as that slowed down the test suite too much (or
was too invasive or ugly, I don't remember which).

Instead, I relied on gc to get rid of the server threads (and
sockets) and get warned by Robert about it :) We settled on the
'xxx tests leak sockets' warning.

It may be time to revisit the problem and its various solutions
(2.6 also change some details in the socket servers used for http
but I was still able to avoid addressing the core problem :-).

   Vincent



More information about the bazaar mailing list