Network glitches costing 15 minutes a pop
Vincent Ladeuil
v.ladeuil+lp at free.fr
Fri Aug 15 15:54:42 BST 2008
>>>>> "Mark" == Mark Hammond <mhammond at skippinet.com.au> writes:
<snip/>
Mark> 937.223 Exception ShortReadvError(): readv() read unknown bytes rather than
Mark> unknown bytes at unknown for
As you noticed in a later mail, this comes from bzrlib/transport/http/_pycurl.py:
elif e[0] == CURLE_PARTIAL_FILE:
# Pycurl itself has detected a short read. We do not have all
# the information for the ShortReadvError, but that should be
# enough
raise errors.ShortReadvError(url,
offset='unknown', length='unknown',
actual='unknown',
extra='Server aborted the request')
<snip/>
Mark> Note the 15 minute gap before the 'got pycurl error: 18' messages.
As Robert noticed this sounds like a timeout.
15 minutes, I've often seen network timeouts of 15 minutes *as a
user*. I've never been able to find where it came from though :-(
Mark> Off the top of my head, I see at least 1 such error
Mark> every 3rd time pulling from Launchpad.
Then you're welcome to provide some wireshark traces (I
understand that they can be hard to get for you) :-/
Mark> It seems to me that little network glitches aren't
Mark> particularly unexpected - but waiting 15 minutes when
Mark> it happens isn't that friendly.
Mark> Is this something specific to Windows? Specific to
Mark> pycurl?
Little is known about it, you're the first one AFAIK to report
that behavior with such a high occurrence frequency. It may be
pycurl, it may be windows, I'd prefer to avoid guesses without
more data.
Mark> Any suggestions about what we can do to make such
Mark> errors have less of an impact?
Yes.
Since you :
- don't use a proxy,
- don't need NTLM authentication,
- don't need to verify https certificates,
try urllib instead.
Either by using http+urllib: instead of plain http: or by using
the following plugin:
,----
| from bzrlib import transport
|
|
| transport.register_lazy_transport('http://', 'bzrlib.transport.http._urllib',
| 'HttpTransport_urllib')
| transport.register_lazy_transport('https://', 'bzrlib.transport.http._urllib',
| 'HttpTransport_urllib')
`----
which will make the urllib implementation become the default
instead of pycurl for http.
As shown above, pycurl doesn't give us precise enough information
about *when* this is occurring, urllib at least will be more
precise.
As John mentioned in a later mail, we also have a strange
select/poll error on Linux with pycurl.
I call that one a "Loch Ness Monster" bug: some pretend they have
seen it but nobody has proof (i.e. receipts to reproduce it).
Somehow, sometimes, bzr as an http client is waiting for a packet
while the server is waiting for some ack before sending another
packet.
It may well be that you're seeing a slightly different symptom
for the same cause: client and server are out of sync and
depending on a yet-to-be-identified cause the client or the
server is aborting the connection before the other.
Vincent
More information about the bazaar
mailing list