Optimising branching and merging big repositories between far away locations...

Wed Oct 29 03:03:47 GMT 2008

Hi,

First thanks for all these details, they are quite interesting and I will
take some time to investigate a bit ;-)

1)
so I ran  a bzr branch -r1 with -Dphss... and my bzr has been stuck there
for a few hours now :

28.885     result:   ('ok',)
29.783                80504 body bytes read
29.806  hpss call w/readv: 'readv',
'/home/autobzr/deployBZR/.bzr/repository/indices/c43cd46c0973a8d886552c0f4b8e4e5f.tix'
29.806                5 bytes in readv request
30.171     result:   ('readv',)
30.364                975 body bytes read
30.376  hpss call w/readv: 'readv',
'/home/autobzr/deployBZR/.bzr/repository/indices/f65de233ce31cbdace7370611f49937c.tix'
30.377                7 bytes in readv request
30.609     result:   ('readv',)
31.419                65536 body bytes read
31.432  hpss call:   'get',
'/home/autobzr/deployBZR/.bzr/repository/indices/f65de233ce31cbdace7370611f49937c.tix'
31.432               (to
bzr+ssh://deploy.sgf.in.iz/home/autobzr/deployBZR/GameBZR-Live-Temporary/)
32.445     result:   ('ok',)
60.013                5903449 body bytes read
62.310  hpss call w/readv: 'readv',
'/home/autobzr/deployBZR/.bzr/repository/packs/f65de233ce31cbdace7370611f49937c.pack'
62.312                11675 bytes in readv request

Not sure why it s still stuck like that...
The progress bar is stuck as well at :
\ [=============================                    ] Copying content texts
3/5

any possibility to get more logs out of it ?

2) I have bzr 1.8 installed everywhere, but as these repositories are big,
so I ll give that a try as a last resort ( it might take some time... )

3) I guess that would be quite efficient since one of my problem is
latency... however I wish there was an easy way to configure that without
touching the code ;-) but I ll give it a try to check if I see any
improvement when I get some time.

4) Last time I tried sftp it was really slower than bzr+ssh ( one the same
repositories that I am working with right now ), so I ll think I ll pass
that one...

5) mmm interesting I ll give it a try too if it comes down to smart bazaar
being not so smart in my case ;-)

So any idea aobut 1) ??

Thanks a lot for the help ;-)

--
Alex

2008/10/28 John Arbash Meinel <john at arbash-meinel.com>

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Asmodehn Shade wrote:
> > Hi *,
> >
> > Alright I recently updated to bzr 1.8.
> >
> > I have repository of a few gigs size ( because of the size of the files
> > in it ) with usually around 100 revisions in each branch.
> >
> > I need to branch from one place to another ( usually quite far away =
> > high latency ) or merge differences.
> > However it appears to be quite slow.
> >
> > Despite the bandwidth limitation and the time needed to transfer big
> > files anyway, for some reasons it s much slower than scp for example.
> > Also despite the big bandwidth ( few Mbps ) available the transfer rate
> > can go down to 1Kbps and stay there for quite a long time...
> >
> > I am using bzr+ssh ( the fastest protocol I could find... ) the
> > repository format is whatever the default is (on 1.5 ) when you "bzr
> > init-repo"
> >
> > So I was wondering if someone here had advise on how I can make the
> > overall branching / pulling / merging operations faster if possible...
> > using more bandwidth or something else...
> >
> > Thanks for your advice ;-)
> >
> > --
> > Alex
>
> So there are a few possibilities with what could be happening. If you
> can help debug further, doing runs with "-Dhpss" will add extra debug
> information in ".bzr.log" (use bzr --version to find this file). That
> will record what commands we are issuing, along with some timing
> information.
>
> As a guess, I would say we are likely to be slow during index
> operations. Where we are probing for more information to see what we
> need to do next.
>
> I know Andrew Bennetts has a patch out that should help some cases. (For
> push/pull we need to find out what one side has that the other doesn't.
> We were doing it "one-revision" at a time, and Andrew updated that to
> make several requests per round trip.)
>
> That landed in bzr.dev as:
> 3795 Canonical.com Patch Queue Manager 2008-10-27 [merge]
>     Reduce round-trips when pushing to an existing repo by using the
>       get_parent_map RPC more,
>       and batching calls to it in _walk_to_common_revisions. (Andrew
>       Bennetts)
>
> There are other possibilities...
>
> 1) You may consider issuing "bzr pack" on the repository. This will
> collapse all of the history (so far) into a single pack file + index.
> This can make things faster (in general looking something up in an index
> is O(log N) so having M indexes is M * log N, rather than log (M*N).
>
> We do a certain amount of packing automatically (we check after every
> commit/push/pull). The automatic algorithm isn't very aggressive, as you
> don't really want to redo your whole repository every commit.
>
> 2) We have a better index format written if you want to test it. I would
> make a copy of your existing repository, and then do "bzr upgrade
> - --development2". I believe all clients will need to be bzr 1.7 or
> greater.
>
> The index format is stable, and we are mostly tuning the code before we
> make it a public & stable repository format. For instance, the old index
> code had logic to allow it to prefetch extra data, and I just landed the
> code to do so for the new index code. In many cases the new format is
> sufficiently better that even without prefetching it was faster (often
> significantly so).
>
> If you are interested, we certainly would be interested in getting
> feedback of how well it performs for you.
>
> 3) Speaking of 'prefetch', you could tune the prefetch algorithm a
> little bit. Probably the value in question comes from:
> bzrlib/transport/remote.py
>
> Around line 304 there should be:
>
> def recommended_page_size(self):
>    """Return the recommended page size for this transport."""
>    return 64 * 1024
>
> You could play around with that value and see if larger values work
> better for you.
>
> For example, you could set it to 64MB (64 * 1024 * 1024) instead of
> 64kB. That would likely cause the prefetch code to just always read the
> whole index with every request, rather than just reading a little bit at
> a time.
>
> 4) It is possible that 'sftp://' might be faster than 'bzr+ssh://' for
> some operations. Mostly because of the prefetch code, which is what
> Andrew was working on. For push specifically, we would be issuing a
> bunch of "do you have revision X" requests, which the remote would
> respond with "no". When using sftp:// we are reading the remote index,
> so we actually get back "no, but I have these 50 random revisions".
> (Interestingly, if remote does have the revision, then it responds with
> "yes, *and* I have these 50 ancestors as well".)
>
> 5) I think someone commented that you can actually do
> "nosmart+bzr+ssh://" which turns of the smart-protocol requests, but
> retains the better file-access behavior from bzr+ssh. (sftp has a
> problem that to read a little bit from a file, you have to issue an
> 'open + read + close', while bzr+ssh can do the whole request with a
> single 'read' request.)
>
> So you *might* try doing the same action with "nosmart+bzr+ssh://" and
> see if that changes things.
>
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkkHcYYACgkQJdeBCYSNAAN/fQCfYurUef81d29+go8Y2RXv74G9
> SqYAoK6vjijG/qU+bahBCs/8+4qI6pdN
> =mvv1
> -----END PGP SIGNATURE-----
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/bazaar/attachments/20081029/c4e76a21/attachment.htm