Deploying with Bazaar (or how a big repo can make you crazy)
jelmer at samba.org
Thu Mar 8 18:09:09 UTC 2012
Am 08/03/12 17:41, schrieb Leonardo Santagada:
> People on the IRC channel said I should share my use story with the
> mailing list, and that is what I will try to do.
> Yesterday I was having problems trying to deploy a bazaar repo
> (lp:openobject-addons/6.1) to a server with 600mb ram and no swap
> space (ubuntu 11.10 bzr 1.4.1). Doing a bzr branch using http was
> getting the process killed by the OOM killer because it was using all
> the avaliable ram. The repository is 493mb in size locally but I think
> it tries to transfer around 600mb+ of data. So here were my steps to
> try to fix the problem:
I'm sorry the experience was so bad for you. Let's see if there is
anything we can do to improve it.
> First people on IRC told me that using ssh is better because it would
> have to transfer less data and that the protocol is smart. My first
> problem is that launchpad doesn't support anonymous access to ssh
> although doing so seems to be very simple (is it still made using the
> twisted ssh implementation?) here is the bug report
> https://bugs.launchpad.net/launchpad/+bug/493389 marked as won't fix.
> So I had to create a launchpad user for a headless machine that only
> downloads code and create/upload a ssh key to launchpad (something
> that isn't easily automated, but I will reuse the key to other
> machines). Surprise, it did not work, actually it transfered like 480
> mb before being killed and was 10x slower than http. Looking at
> .bzr.log you can see the process was still only fecthing data. So
> simple bzr branch was out of the question.
> Then someone said to bzr branch -rN where N is a small number to see
> if that would work. I tried to get the speed back so tried http again,
> apparently it tried to download the whole repo again so I stoped it to
> try ssh. SSH worked but then I would have to manually split the
> download so bzr doesn't eat the whole memory. That is way too manual
> to my taste so I gave up on that also.
SSH by itself isn't any faster, but the fact that you're running the bzr
smart server protocol over SSH. The smart server protocol (HPSS) is
optimized for bzr operations. If you use a "dumb" transport, like plain
HTTP, FTP or SFTP then you're basically reading the raw remote files and
fetching whatever bits you need by parsing them. This can involve
loading a (part of) a large pack file into memory. With the smart server
protocol, your client can just ask for specific data.
Launchpad doesn't have smart server support enabled over HTTP, so if you
try to clone a branch over Launchpad using HTTP it will use "dumb"
access. I think the reason bug 493389 was closed is because it makes
more sense to allow smart server protocol access over HTTP or plain TCP
than to allow anonymous SSH (but I can only speculate).
I don't think using -rN will help if you're using "dumb" protocol access
to retrieve the branch; with smart protocol access (like bzr+ssh://,
bzr:// or http:// with the smart server enabled), it will help.
> Talking about a huge repo, this one has 493mb and around 40k rev. I
> used fastexport to see how big it would be in git and another bad news
> it gets a tad smaller there 401mb after a git repack -a -d -f -F
> (don't ask about all those flags, git cli is crazy). This goes against
> the benchmarks posted on bzr, should they be updated or something?
After repacking the bzr repository I it goes down from 490 to 450 Mb
here. That's still not the same result as Git, but it's in the same
ballpark. I'm sure you could find examples either way where one is
slightly more efficient than the other.
> Why does bzr uses so much memory do to a simple branch, and is
> --stacked the best way to do source deployment?
I think --stacked is probably fine to do deployment. I personally would
use lightweight checkouts ("bzr co --lightweight") or the bzr-upload
plugin (doesn't require bzr to be installed on the server) to do deployment.
More information about the bazaar