The case against until_no_eintr

Andrew Bennetts andrew.bennetts at canonical.com
Fri Feb 26 00:31:37 GMT 2010


Martin (gzlist) wrote:
> On 21/02/2010, Andrew Bennetts <andrew.bennetts at canonical.com> wrote:
> >
> > Would it be accurate to summarise your view as “this needs to be fixed
> > in Python, because it's impossible for Bazaar to fix it everywhere”?  If
> > so, I absolutely agree.
> 
> Yep, that's about right. I know dealing with upstream is always a pain
> and feels like twice as much work as just hacking around their
> problems in your own code, but generally pays off long-term.

Well, I'm pretty sure we're all in favour of submitting bugs to our upstreams
(not just Python but e.g. Paramiko and Pyrex too).  It's just that
usually we need to workaround it anyway because we want to work with
already released versions, and so that gets done first.  The work to
submit a useful bug report upstream can sometimes be delayed or even
forgotten, but it's not for lack of intention.

> > One approach would be to call signal.siginterrupt(signum, False) for
> > every signal, or at maybe those with signal handlers other than SIG_IGN.
> > I'm not totally certain this won't have undesireable side-effects
> > (anything involving signals always seems to involve unpleasant
> > surprises), but on my reading so far it might be ok.  This function is
> > only available since Python 2.6, but maybe we could attempt to use
> > ctypes or a small C module to do it on 2.4 and 2.5.  What do you think?
> 
> That might be acceptable, but will upset some current expectations.
> 
> Python has been carefully (doing the equivalent of) calling
> siginterrupt(sig, 1) for every signal handler it registers for a long
> while, as otherwise it's possible to hang indefinitely in an
> uninterruptable state.
> 
> The reasoning (for my own future reference as much as anything) as I
> understand it is this: The python C-level signal handler sets some
> state saying a signal has arrived, and calls the python-level function
> when the interpreter resumes. This is because no real work can be done
> in a handler, as the program is in an undetermined state when they
> run. Therefore if a system function is automatically being resumed, it
> will still delay the python code intended to respond to that signal.
> 
> So, while the BSD style is a much nicer interface for C code, it
> doesn't work with the way Python wants to deal with signals unless
> nothing is going to block for an unreasonable length of time.

I don't quite follow this... oh yes I do.  Hmm.

So the most visible manifestation of this is likely to be that Ctrl-\ no
longer can interrupt a read or recv syscall to put you into the
debugger.  If some bytes arrive then the read/recv call can return a
short read and then things will proceed as expected, but if no more
bytes arrive then you're stuck.  Drat.  So we have a choice here between
“SIGQUIT-for-pdb doesn't work reliably” and “SIGQUIT-for-pdb works
reliably but may break IO by causing EINTR”.

Ctrl-C will still interrupt things immediately, because we're not
planning to call siginterrupt for SIGINT, which is good.

I think calling siginterrupt(SIGWINCH, False) when we register a
SIGWINCH handler is probably appropriate.  There's no reason to risk
breaking an IO operation just because someone resized a window.

I think for SIGQUIT we're probably better off not setting siginterrupt,
and making a best-effort attempt to handle EINTR for reading and writing
to pipes.  For other IO that might be broken hopefully the developer can
run a new-enough Python that copes automatically (and hopefully the
developer doesn't need to use the debugger specifically to diagnose a
problem on an older Python...).

[..]
> Don't want to pick a fight, but think this is rather depressing:
> 
> C:\bzr\bzr\dev>bzr blame --long bzrlib\tuned_gzip.py | grep -C1 upstream
> 1641.1.1  robertc at robertcollins.net 20060407 |
>                                              | """Bzrlib specific gzip
> tunings. We plan to feed these to the upstream gzip."""
>                                              |
> 
> Had two years to get into Python 2.6 which Ubuntu have been shipping
> for what, two releases now. Is there even as much as an issue on their
> tracker?

FWIW, my vague recollection is that someone (unrelated to bzr?) had
filed essentially the same bug report and/or patch already when we went
to pass this upstream.  I'm not sure if tuned_gzip has any advantage
over the version in 2.6 anymore, but I'm pretty sure it's still relevant
for at least 2.4.  Robert may know more.

But doing the investigation to find out just which changes matter and
are safe for upstream vs. improvements already made in upstream is
pretty time-consuming, so it's easy to see how it doesn't always get
done as soon as it should :(

> > I agree some places are more important to fix than others, and that the
> > fixes really need to be completely safe (unlike many of the current
> > uses of until_no_eintr, which are very broken).
> >
> > I don't think it follows that because Bazaar has to cope with connection
> > drops, so therefore failures due to EINTR are acceptable.  As a user,
> > I'm ok with a failure because my internet connection dropped out.  I'm
> > not ok with a failure because the Bazaar GUI I'm using had a subprocess
> > finish while during a push.
> 
> Clearly failures are bad, but code trying to address an issue with a
> microsecond window of opportunity might not be worthwhile when the
> effect of that rare possibility is no worse than external problems
> that need to be handled anyway. Making sure we can cover the recv
> calls that are likely to block for seconds at a time seems a priority,
> while even sendall does not as it is much harder to break.

I'm not deeply concerned with microsecond windows either, but we do a
lot of network IO and pipe IO, and the windows there are pretty huge.

> Think I'd actually made this branch before reading your email, see the related:
> <https://code.launchpad.net/~gz/bzr/ignore_sigquit_in_ssh_child_162502/+merge/19711>

Ooh, thanks!  I just reviewed (and approved) that.

-Andrew.




More information about the bazaar mailing list