Incorrect handling of EINTR in Bazaar

Martin (gzlist) gzlist at googlemail.com
Mon Feb 15 18:01:44 GMT 2010


There is trivial issue with module cleanup caused by a couple
patches[1][2] trying to address problems related to IO calls being
interrupted by signals. Looking into fixing this revealed that these
changes are incorrect in a number of significant ways, and a much
better solution[3] has already been applied upstream. I propose that
the attempt to handle these low-level issues in high-level Bazaar code
be backed out, a tighter workaround implemented, and perhaps lobby for
a backport to 2.6 for the Python patch. Following is a detailed
breakdown of what I think is wrong with the code currently in Bazaar.

There are no tests. Instigating workarounds without anything to verify
they are in fact correct means the other issues described are not
detected.

The socket.error exception isn't a subclass of IOError (and doesn't
have the errno attribute?) till Python 2.6 so until_no_eintr is
useless on previous versions.

This approach to the PC loser-ing problem is nix specific, winsock
calls aren't going to be interrupted by signals.

The way the EINTR catching code is structured makes it far from clear
what the methods being wrapped actually are, as I understand it the
objects that it get applied to are:

  SmartServerPipeStreamMedium members _in and _out
    StringIO objects or sys.stdin and sys.stdout (and spot the bug on
windows with binary file mode)
  SmartSimplePipesClientMedium members _readable_pipe and _writeable_pipe
    StringIO objects or subprocess file streams
  SmartSSHClientMedium members _read_from and and _write_to
    socket._fileobject as returned by socket._socketobject.makefile

So, the following methods are wrapped:

  socket._socketobject.recv
    Potentially valid
  socket._socketobject.sendall
    Can't be safely re-called, as send may be called a number of times
before interruption
  socket._socketobject.close
    Bogus as it always invalidates the fd before returning and never
raises, see sock_close in Modules/socketmodule.c
  file.read
    Bogus, seems high level block io (fread) can't be interrupted
  file.write
    Bogus, seems high level block io (fwrite) can't be interrupted
  file.flush
    Bogus, seems high level block io (fflush) can't be interrupted
  file.close
    Bogus as first invalidates the internal handle so no point
re-calling, see close_the_file in Objects/fileobject.c
  StringIO.StringIO.*
    Bogus as will never throw EINTR
  socket._fileobject.flush
    Can't be safely re-called, as it uses socket._socketobject.sendall
internally
  socket._fileobject.close
    Can't be safely re-called, as it uses socket._fileobject.flush and
always calls socket._socketobject.close
  socket._fileobject.write
    Can't be safely re-called, as appends to an instance buffer
unconditionally and may use socket._fileobject.flush
  socket._fileobject.read
    Can't be safely re-called, because it uses a local buffer and
multiple recv calls

Finally, the problem I actually started with, there are bunch of these
at the end of the test suite:

  Exception exceptions.AttributeError: "'NoneType' object has no
attribute 'until_no_eintr'" in <bound method
SmartTCPClientMedium.__del__ of
<bzrlib.smart.medium.SmartTCPClientMedium object at 0x122C1B90>>
ignored

Clearly there shouldn't be any servers left around by the time the
world is being dismantled, but the long-standing leaks mean there are.
Basically, SmartTCPClientMedium.disconnect is called from
SmartClientStreamMedium.__del__ when the module is in process of being
cleaned up so osutils is gone, and an exception trying to call
osutils.until_no_eintr results. Resolving issues like this with
__del__ is easy, but pointless as we've already seen that wrapping
close calls does nothing of use.

There were questions raised about whether until_no_eintr was correct
before on this list[4], but neither that nor the initial code
review[5][6] caught these issues. The only change beyond the backout
that I suggest is catching EINTR in _read_bytes_from_socket and
relying on the fix upstream for the remainder, but I welcome input
from people who's primary platform actually uses signals.

Martin


[1]: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/revision/3935
[2]: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/revision/4634.6.16
[3]: http://svn.python.org/view?view=rev&revision=74426
[4]: https://lists.ubuntu.com/archives/bazaar/2009q4/065306.html
[5]: https://lists.ubuntu.com/archives/bazaar/2009q1/051236.html
[6]: https://code.launchpad.net/~mbp/bzr/341535-eintr/+merge/11029



More information about the bazaar mailing list