[MERGE] A hack to make urllib not call recv(1) lots and lots.

Andrew Bennetts andrew at canonical.com
Sun Mar 16 21:21:26 GMT 2008


John Arbash Meinel wrote:
> Andrew Bennetts wrote:
> > Pavel Pergamenshchik at PyCon noticed that "bzr branch http://..." on a large
> > branch was using lots of CPU, and strace showed that it was making a huge number
> > of recv(1) calls to read bytes off the socket one byte at a time.  It seems this
> > is caused because httplib's HTTPResponse doesn't specify a buffer size for the
> > fileobject it constructs from its socket, and it doesn't provide a way to
> > override this.
> > 
> > This patch is a hack to fix this.  It's a bit dirty, but it massively reduces
> > the number of recv calls bzr makes with urllib.
> > 
> > -Andrew.
> > 
> > 
> 
> Are we sure that this is safe to do on all platforms?

I'm not sure that it is, although I'd guess it is.  If a Windows user would like
to test it, that'd be helpful.

> Anyway, I like the idea, I would be very curious to see the actual
> effect of this.

I don't have timing differences yet, but it does reduce the number of recv
syscalls when doing an initial branch of bzr-gtk from 277861 to 2912.  I'd
expect that proportion to hold for larger branches, and I'd expect it to make
significant difference to times.  I'll run some tests and report back.

-Andrew.




More information about the bazaar mailing list