Latency leading to very very long pull times
John Arbash Meinel
john at arbash-meinel.com
Thu Jul 19 20:19:21 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> Andrew King wrote:
>> Hi guys,
>
>> Thanks for all the great work with bzr.
>> Now to the wishlist :)
>
>> I know there is a lot of work being done on performance, and I have
>> searched the lists in particular for "latency" and seen that it can
>> cause bad performance.
>
>> Unfortunately, I am now in the situation where a pull across a remote
>> link (350ms latency WAN) can take 20-30 minutes. (A fresh branch takes
>> hours).
I'm curious what transport you are using.
If you are using sftp, it is known to have poorer performance than both
'bzr+ssh' and http. Because of the protocol itself. (you have to issue an open,
and wait, before you can issue a read.)
Also, if you are using http, make sure you have connection keep-alive. I don't
believe it defaults to on in Apache. But if you have it on, you might also
consider increasing the maximum number of requests. (I think Apache defaults to
100 before closing, you could make it even 200).
>> So, some questions:
>> 1. Is this fixable, or is it a characteristic of the bzr model or data
>> structures?
>
> This is fixable. There are two parallel efforts:
> 1. the smart server
> 2. container-based repositories
>
>> 2. If it is fixable, is there any kind of time line?
>
> Not a fixed one, but I'd be surprised if we didn't have Smart Server
> improvements within a month
>
>> 3. Is this to do with the total size of the repository, or the number
>> of revisions that have changed, or both?
>
> The greatest effect on latency would be the number of files. However,
> the size of the repository and inventory indices can cause
> bandwidth-limiting.
>
>> 5780 revisions
>> 103388 KiB in repo.
>
> This is about twice the size of Bazaar, both in revision count and in
> repository size.
Well, it depends how you count revisions. Considering we have 12.1k ancestral
revisions, but only 2.6k mainline. (bzr ancestry | wc -l versus bzr revno).
That said, it seems closer to 3x the total number of bytes.
If pull is also being an issue (even when you only change a couple files), then
it could indeed be a bandwidth-limiting issue. (We currently have to download
all of .bzr/repository/inventory.kndx and revisions.kndx, and any .kndx for a
file that was modified.).
You might want to consider upgrading to Branch6 (--format=dirstate-tags), since
that format maintains a single 'last_revision' field, rather than a list of all
revisions in 'revision-history'.
>
> One thing you could try is using merge directives instead of a branch.
> Since they're single files, they shouldn't take as long to download.
>
> Aaron
Good point. For small changes, it also means you don't have to read all the
remote .kndx files.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGn7k4JdeBCYSNAAMRAn4cAKC1vDh5K10+icDlaFJ6/PmaHzD71ACfcd4S
xJIfOtHdV2DF1vA+hWmlcg4=
=W+cI
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list