[MERGE] A hack to make urllib not call recv(1) lots and lots.

Andrew Bennetts andrew at canonical.com
Mon Mar 17 04:26:59 GMT 2008


SuperMMX wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi, Andrew Bennetts <andrew at canonical.com> :
> 
> On Sun, 16 Mar 2008 14:56:58 -0500
> Andrew Bennetts <andrew at canonical.com> wrote:
> 
> > Pavel Pergamenshchik at PyCon noticed that "bzr branch http://..." on a large
> > branch was using lots of CPU, and strace showed that it was making a huge number
> > of recv(1) calls to read bytes off the socket one byte at a time.  It seems this
> > is caused because httplib's HTTPResponse doesn't specify a buffer size for the
> > fileobject it constructs from its socket, and it doesn't provide a way to
> > override this.
> > 
> > This patch is a hack to fix this.  It's a bit dirty, but it massively reduces
> > the number of recv calls bzr makes with urllib.
> 
> Here is the rough number on Windows, without the patch, the CPU usage is
> about 45%, but with the patch, the CPU usage never exceed 10%
> 
> Any way to get the precise number ?

Interesting.  In fact that's pretty surprising now that I've timed the results
on my laptop.

FWIW, I just did a timing locally on my laptop.  Without the hack, it takes me
3m 15s to branch Twisted trunk out of my bzr-svn import of it (78M of history,
using Apache as the server).  With the hack, it takes 3m 9s.  This is the branch
that Pavel originally noticed the excessive recv calls on.

I did take some care to make sure the cache was hot for both runs, but this is
still a pretty small difference, probably still within the natural variation on
a not totally quiescent laptop.  In fact, branching direct from disk took 4m 22s
(much longer!), so something is definitely weird.  Anyway, the timings don't
suggest to me that the risk of a nasty hack is worth it for such an uncertain reward.

So,
bb:reject

But if it turns out to make a big improvement on Windows, I'd be tempted to
reconsider.

-Andrew.




More information about the bazaar mailing list