[MERGE] A hack to make urllib not call recv(1) lots and lots.

Mon Mar 17 07:53:51 GMT 2008

>>>>> "Andrew" == Andrew Bennetts <andrew at canonical.com> writes:

    Andrew> SuperMMX wrote:
    >> -----BEGIN PGP SIGNED MESSAGE-----
    >> Hash: SHA1
    >> 
    >> Hi, Andrew Bennetts <andrew at canonical.com> :
    >> 
    >> On Sun, 16 Mar 2008 14:56:58 -0500
    >> Andrew Bennetts <andrew at canonical.com> wrote:
    >> 
    >> > Pavel Pergamenshchik at PyCon noticed that "bzr branch
    >> > http://..." on a large branch was using lots of CPU, and
    >> > strace showed that it was making a huge number of
    >> > recv(1) calls to read bytes off the socket one byte at a
    >> > time.  It seems this is caused because httplib's
    >> > HTTPResponse doesn't specify a buffer size for the
    >> > fileobject it constructs from its socket, and it doesn't
    >> > provide a way to override this.
    >> > 
    >> > This patch is a hack to fix this.  It's a bit dirty, but
    >> > it massively reduces the number of recv calls bzr makes
    >> > with urllib.

I'm not sure you can do that.

There are several layers of buffering involved (depending on
client http version, server http version and whether or not you're
using https).

I'm not 100% sure, but I think that when the server is http/1.1
there may be cases where we want to read the stream one byte at a
time to avoid issuing a blocking read.

On the other hand, I may be confused by some bug I already fixed
when making our test server 1.1 compliant (because the only
scenario I can think of is an http/1.1 server sending a reply
without[0] specifying the body length which *was* a bug).

     Vincent

[0]: length-prefixed protocols ftw !