Latency leading to very very long pull times

Fri Jul 20 14:49:16 BST 2007

On 7/19/07, John Arbash Meinel <john at arbash-meinel.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Aaron Bentley wrote:
> > Andrew King wrote:
> >> Hi guys,
> >
> >> Thanks for all the great work with bzr.
> >> Now to the wishlist :)
> >
> >> I know there is a lot of work being done on performance, and I have
> >> searched the lists in particular for "latency" and seen that it can
> >> cause bad performance.
> >
> >> Unfortunately, I am now in the situation where a pull across a remote
> >> link (350ms latency WAN) can take 20-30 minutes. (A fresh branch takes
> >> hours).
>
> I'm curious what transport you are using.
>

we are using bzr+ssh

> If you are using sftp, it is known to have poorer performance than both
> 'bzr+ssh' and http. Because of the protocol itself. (you have to issue an open,
> and wait, before you can issue a read.)
>
> Also, if you are using http, make sure you have connection keep-alive. I don't
> believe it defaults to on in Apache. But if you have it on, you might also
> consider increasing the maximum number of requests. (I think Apache defaults to
> 100 before closing, you could make it even 200).
>
>
> >> So, some questions:
> >> 1. Is this fixable, or is it a characteristic of the bzr model or data
> >> structures?
> >
> > This is fixable.  There are two parallel efforts:
> > 1. the smart server
> > 2. container-based repositories
> >
> >> 2. If it is fixable, is there any kind of time line?
> >
> > Not a fixed one, but I'd be surprised if we didn't have Smart Server
> > improvements within a month
> >
> >> 3. Is this to do with the total size of the repository, or the number
> >> of revisions that have changed, or both?
> >
> > The greatest effect on latency would be the number of files.  However,
> > the size of the repository and inventory indices can cause
> > bandwidth-limiting.
> >
> >> 5780 revisions
> >> 103388 KiB in repo.
> >
> > This is about twice the size of Bazaar, both in revision count and in
> > repository size.
>
> Well, it depends how you count revisions. Considering we have 12.1k ancestral
> revisions, but only 2.6k mainline. (bzr ancestry | wc -l versus bzr revno).
>
> That said, it seems closer to 3x the total number of bytes.
>
> If pull is also being an issue (even when you only change a couple files), then
> it could indeed be a bandwidth-limiting issue. (We currently have to download
> all of .bzr/repository/inventory.kndx and revisions.kndx, and any .kndx for a
> file that was modified.).
>
> You might want to consider upgrading to Branch6 (--format=dirstate-tags), since
> that format maintains a single 'last_revision' field, rather than a list of all
> revisions in 'revision-history'.

how do we do this, and what are the implications? Is it stable?
>
> >
> > One thing you could try is using merge directives instead of a branch.
> > Since they're single files, they shouldn't take as long to download.
> >

how do we do this aaron? do you mean just type bzr merge instead of bzr pull?

> > Aaron
>
> Good point. For small changes, it also means you don't have to read all the
> remote .kndx files.
>
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFGn7k4JdeBCYSNAAMRAn4cAKC1vDh5K10+icDlaFJ6/PmaHzD71ACfcd4S
> xJIfOtHdV2DF1vA+hWmlcg4=
> =W+cI
> -----END PGP SIGNATURE-----
>