1.6 fetch regression

John Arbash Meinel john at arbash-meinel.com
Thu Aug 28 14:15:04 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "john" == John Arbash Meinel <john at arbash-meinel.com> writes:
> 
> <snip/>
> 
>     >>> (note that this repo is slightly different, but still a packed repo) I'm
>     >>> actually quite surprised to see that bzr-1.5 branching over http:// is
>     >>> *faster* than branching locally. (1m versus 1m15s).
>     >> 
>     >> async behaviour in pycurl, I'd guess.
>     >> 
>     >> -Rob
> 
>     john> I'm using urllib.
> 
> Try with pycurl then :D
> 
> Seriously, I don't know how you consolidate your numbers, but if
> you can redo that for pycurl and find a significant difference
> then it may be related to internal buffering being done
> differently in urllib.
> 
> I'm quite surprised that http is coming out being faster than
> alternatives when there is no good reason for it.
> 
> The only thing that comes to mind then is that urllib try very
> hard to avoid buffering readv, i.e. callers of readv reads data
> from the underlying socket as much as possible.
> 

So this is something I've been suspecting in the smart-server code. In that a
large portion of time is spent on the line:

  self._in_buffer += bytes

And it turns out that:
% time bzr branch http+urllib://
 36.00s user 3.05s system 68% cpu 57.126 total
% time bzr branch http+pycurl://
 34.64s user 1.86s system 47% cpu 1:17.09 total
% time bzr branch bzr+ssh://remote	(remote is a 700MHz machine)
 61.08s user 44.23s system 76% cpu 2:18.44 total
% time bzr branch bzr+ssh://local
 54.47s user 29.92s system 94% cpu 1:28.89 total
% time bzr branch local
 32.84s user 0.77s system 98% cpu 34.247 total

So bzr+ssh:// has a lot of user overhead. My remote machine is really slow.
And pycurl buffers more and makes things slower.

I wonder if it is just because we can process the bits of the message as we
go, rather than waiting for the download and then coming back. But then I also
think some of the time is just spent managing the buffer. We're copying about
80MB of data for this test. So I'm guessing that some of the time is just
reallocating a smaller buffer many times as data comes in. (Which urllib avoids.)

John
=:->


> This is not the case for pycurl. So my suggestion is that you
> redo your measures with pycurl and if the same kind of difference
> appears you may have a new track to follow.
> 
> Hth,
> 
>         Vincent
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFItqTYJdeBCYSNAAMRAq/FAKDC76tbeOX5qQ541ACcqnmNWgx6RACg1zUe
wVrb+4NJWLtTcpeeXPlPLWA=
=5j09
-----END PGP SIGNATURE-----



More information about the bazaar mailing list