Incorrect handling of EINTR in Bazaar

Martin Pool mbp at canonical.com
Tue Feb 16 00:59:10 GMT 2010


On 16 February 2010 05:01, Martin (gzlist) <gzlist at googlemail.com> wrote:
> There is trivial issue with module cleanup caused by a couple
> patches[1][2] trying to address problems related to IO calls being
> interrupted by signals. Looking into fixing this revealed that these
> changes are incorrect in a number of significant ways, and a much
> better solution[3] has already been applied upstream. I propose that
> the attempt to handle these low-level issues in high-level Bazaar code
> be backed out, a tighter workaround implemented, and perhaps lobby for
> a backport to 2.6 for the Python patch. Following is a detailed
> breakdown of what I think is wrong with the code currently in Bazaar.

Thanks for analyzing this.

We still support python back to 2.4; considering the deployed
environment of interpreters we prefer to work around bugs in Python if
at all feasible.

> There are no tests. Instigating workarounds without anything to verify
> they are in fact correct means the other issues described are not
> detected.

That's true (on both counts.)  However, this is a bit of an
interesting case to test: one could in principle write objects that
insert EINTR calls, but if the actual problem is a misunderstanding of
interaction with the OS, then the tests are likely to reproduce the
same misunderstanding.

> The socket.error exception isn't a subclass of IOError (and doesn't
> have the errno attribute?) till Python 2.6 so until_no_eintr is
> useless on previous versions.

> This approach to the PC loser-ing problem is nix specific, winsock
> calls aren't going to be interrupted by signals.

Yes, but isn't it harmless to check for them?

Similarly, I raised the issue in review that some python calls already
handle eintr or partial reads, but it seems harmless to check for it.


>
> The way the EINTR catching code is structured makes it far from clear
> what the methods being wrapped actually are, as I understand it the
> objects that it get applied to are:
>
>  SmartServerPipeStreamMedium members _in and _out
>    StringIO objects or sys.stdin and sys.stdout (and spot the bug on
> windows with binary file mode)
>  SmartSimplePipesClientMedium members _readable_pipe and _writeable_pipe
>    StringIO objects or subprocess file streams
>  SmartSSHClientMedium members _read_from and and _write_to
>    socket._fileobject as returned by socket._socketobject.makefile
>
> So, the following methods are wrapped:
>
>  socket._socketobject.recv
>    Potentially valid
>  socket._socketobject.sendall
>    Can't be safely re-called, as send may be called a number of times
> before interruption
>  socket._socketobject.close
>    Bogus as it always invalidates the fd before returning and never
> raises, see sock_close in Modules/socketmodule.c
>  file.read
>    Bogus, seems high level block io (fread) can't be interrupted
>  file.write
>    Bogus, seems high level block io (fwrite) can't be interrupted
>  file.flush
>    Bogus, seems high level block io (fflush) can't be interrupted
>  file.close
>    Bogus as first invalidates the internal handle so no point
> re-calling, see close_the_file in Objects/fileobject.c
>  StringIO.StringIO.*
>    Bogus as will never throw EINTR
>  socket._fileobject.flush
>    Can't be safely re-called, as it uses socket._socketobject.sendall
> internally
>  socket._fileobject.close
>    Can't be safely re-called, as it uses socket._fileobject.flush and
> always calls socket._socketobject.close
>  socket._fileobject.write
>    Can't be safely re-called, as appends to an instance buffer
> unconditionally and may use socket._fileobject.flush
>  socket._fileobject.read
>    Can't be safely re-called, because it uses a local buffer and
> multiple recv calls

So perhaps we should have a decorator object that does handle eintr
and install that only when needed?

>
> Finally, the problem I actually started with, there are bunch of these
> at the end of the test suite:
>
>  Exception exceptions.AttributeError: "'NoneType' object has no
> attribute 'until_no_eintr'" in <bound method
> SmartTCPClientMedium.__del__ of
> <bzrlib.smart.medium.SmartTCPClientMedium object at 0x122C1B90>>
> ignored
>
> Clearly there shouldn't be any servers left around by the time the
> world is being dismantled, but the long-standing leaks mean there are.
> Basically, SmartTCPClientMedium.disconnect is called from
> SmartClientStreamMedium.__del__ when the module is in process of being
> cleaned up so osutils is gone, and an exception trying to call
> osutils.until_no_eintr results. Resolving issues like this with
> __del__ is easy, but pointless as we've already seen that wrapping
> close calls does nothing of use.
>
> There were questions raised about whether until_no_eintr was correct
> before on this list[4], but neither that nor the initial code
> review[5][6] caught these issues. The only change beyond the backout
> that I suggest is catching EINTR in _read_bytes_from_socket and
> relying on the fix upstream for the remainder, but I welcome input
> from people who's primary platform actually uses signals.

Can you be more specific about what you want to see backed out, and
what approach you think we should have instead?

-- 
Martin <http://launchpad.net/~mbp/>



More information about the bazaar mailing list