Work flow on large repositories
Andrew Bennetts
andrew.bennetts at canonical.com
Wed Jul 28 05:14:21 BST 2010
Michael Hope wrote:
[...]
> My issue is that the various operations are taking too long. Could
> anyone suggest tricks or a different work flow to speed things up?
>
> Some of the operations include:
>
> Creating a mirror branch by doing init-repo, branch lp:gcc-linaro/4.4.
> The finding revisions stage takes about 10 minutes at 1kB/s. The
This would be <https://bugs.launchpad.net/bzr/+bug/388269>. I think we
should try to find some cheap 90% solution here rather than waiting for
an excellent answer, e.g. add an “is_empty” method to repository so
fetch can short circuit this case. There are lots of cases it doesn't
address, but it would fix the common case of “bzr init-repo myrepo; cd
myrepo; bzr branch $big_branch”.
A workaround would be to replace:
mkdir gcc-linaro
cd gcc-linaro
bzr init-repo
bzr branch lp:gcc-linaro/4.4
With this crude hack:
mkdir gcc-linaro
bzr branch lp:gcc-linaro/4.4
bzr init-repo
rm -r .bzr/repository
mv 4.4/.bzr/repository .bzr/
touch .bzr/repository/shared-storage
i.e. make a standalone branch and then convert it into a shared
repository afterwards. (You can use “bzr reconfigure --use-shared”
instead of the manual poking at .bzr/ contents, but it will be slower.)
> Day-to-day work is done on topic branches. Creating the branch takes
> 46 s, 250 MB of RAM, and creates a 20 MB .bzr directory. Pushing this
> branch to LP for merging involves pushing the full 20 MB, but this is
> acceptable.
What's the “du -hs .bzr/*” output for that? .bzr/branch should be
a small (and constant apart from branch.conf) size, .bzr/repository
shouldn't exist if the branch is inside a shared repo... so I guess
the 20MB is mainly .bzr/checkout? I'm pretty sure that's what would be
going on, but it would be nice to confirm.
> Doing a bzr pull on the 4.4 mirror directory may more than half an
> hour and more than 500 MB of memory.
Wow, that seems terrible. I'll try to reproduce locally and see what I
can find. I don't have a workaround to suggest off the top of my head.
> Doing a bzr checkout takes over 20 minutes and 800 MB of memory on my
> fastest machine. On my netbook and ARM board this causes significant
> swapping. I've yet to complete a checkout on either.
>
> I'd also like to share the mirror with other local machines to skip
> downloading the same 500 MB many times. Running bzr serve and then
> checking out causes 100 % CPU usage for more than 10 minutes on the
> host.
I'll try reproduce these too and see what I can find. I suspect it's
mostly a combination of known bugs so it's probably not going to be
quick to fix, but we'll see what lsprof and meliae and friends find...
As a workaround for sharing the mirror you could try sharing it via http
or sftp. It is likely to work out even slower for the clients, but it
will put much less load on the server.
-Andrew.
More information about the bazaar
mailing list