The case against until_no_eintr

Sun Feb 21 02:01:34 GMT 2010

Martin (gzlist) wrote:
> I realise I may not have conveyed, beyond the "it's currently really
> broken" why I think the until_no_eintr approach is a bad idea.

Thanks for sending this!

[...much good explaining snipped...]
> Using until_no_eintr takes a narrow problem with sockets and pipes and
> spreads it all over the standard library and all over bzrlib.

Would it be accurate to summarise your view as “this needs to be fixed
in Python, because it's impossible for Bazaar to fix it everywhere”?  If
so, I absolutely agree.

However, the Python versions we run on today don't have those fixes.
So I'd like Bazaar to try to do what it reasonably can to mitigate
against the bug.

One approach would be to call signal.siginterrupt(signum, False) for
every signal, or at maybe those with signal handlers other than SIG_IGN.
I'm not totally certain this won't have undesireable side-effects
(anything involving signals always seems to involve unpleasant
surprises), but on my reading so far it might be ok.  This function is
only available since Python 2.6, but maybe we could attempt to use
ctypes or a small C module to do it on 2.4 and 2.5.  What do you think?

Otherwise, I don't see what's wrong with working around the bug as much
as practical in bzrlib.  Yes, we can't fix bugs in e.g. ftplib, but we
can at least make sure that talking to an SSH process is robust (and for
that matter, we can make sure that large uploads via FTP are robust,
even if we can't fix the small writes ftplib will make on the control
channel).  It's not really much different to bzrlib.cleanup, which in
essence provides a workaround for Python 2.4's lack of “try/finally
blocks where exceptions during finally don't override exceptions from
the try”, or using bzrlib.tuned_gzip to because the builtin gzip was too
slow.

> Peronally, I'd be happy with an 80% approach that addresses only those
> places where interruption is actually likely as opposed to
> theoretically possible. What really needs to be avoided are fixes that
> turn a rare exception into silent loss or duplication of data on the
> wire. Bazaar can (and already has to for other reasons) recover from a
> connection being dropped unexpectedly, and there's only so much
> robustness than can be expected from Python (or C) code.

I agree some places are more important to fix than others, and that the
fixes really need to be completely safe (unlike many of the current
uses of until_no_eintr, which are very broken).

I don't think it follows that because Bazaar has to cope with connection
drops, so therefore failures due to EINTR are acceptable.  As a user,
I'm ok with a failure because my internet connection dropped out.  I'm
not ok with a failure because the Bazaar GUI I'm using had a subprocess
finish while during a push.

(And as a developer, it's very inconvenient for the SIGQUIT-for-pdb
feature to break the active connection.)

> Andrew has mentioned in the merge request for my first pass at a
> different method[1] that the sendall-with-bug-workarounds should be a
> public function, and there's a use of sendall in bzrlib.transport.ftp
> that could be using it as well. Well, let's take another example from
> the same module:

Even without the EINTR-workaround, I still think that method should
exist, because of it avoids 10053 errors on windows and it will give
better activity reporting than sock.sendall directly.  I'm certain that
using it in that one place in ftp.py will give noticeable improvement to
the feedback the UI gives during an upload via FTP.  So I think the
argument for a send_all (to socket) method in osutils is orthogonal to
the until_no_eintr discussion.

It's likely that we haven't been filing enough bugs on Python about
these sorts of issues, and if so I'm sorry about that and will think
about how to do better.

-Andrew.