Fwd: Deploying with Bazaar (or how a big repo can make you crazy)

Leonardo Santagada santagada at gmail.com
Fri Mar 9 00:24:04 UTC 2012

I didn't know this list don't set reply-to, sorry.

---------- Forwarded message ----------
From: Leonardo Santagada <santagada at gmail.com>
Date: Thu, Mar 8, 2012 at 3:44 PM
Subject: Re: Deploying with Bazaar (or how a big repo can make you crazy)
To: Jelmer Vernooij <jelmer at samba.org>

On Thu, Mar 8, 2012 at 3:09 PM, Jelmer Vernooij <jelmer at samba.org> wrote:
> Hi Leonardo,
> Am 08/03/12 17:41, schrieb Leonardo Santagada:
>> People on the IRC channel said I should share my use story with the
>> mailing list, and that is what I will try to do.
>> Yesterday I was having problems trying to deploy a bazaar repo
>> (lp:openobject-addons/6.1) to a server with  600mb ram and no swap
>> space (ubuntu 11.10 bzr 1.4.1). Doing a bzr branch using http was
>> getting the process killed by the OOM killer because it was using all
>> the avaliable ram. The repository is 493mb in size locally but I think
>> it tries to transfer around 600mb+ of data. So here were my steps to
>> try to fix the problem:
> I'm sorry the experience was so bad for you. Let's see if there is anything
> we can do to improve it.
>> First people on IRC told me that using ssh is better because it would
>> have to transfer less data and that the protocol is smart. My first
>> problem is that launchpad doesn't support anonymous access to ssh
>> although doing so seems to be very simple (is it still made using the
>> twisted ssh implementation?)  here is the bug report
>> https://bugs.launchpad.net/launchpad/+bug/493389 marked as won't fix.
>> So I had to create a launchpad user for a headless machine that only
>> downloads code and create/upload a ssh key to launchpad (something
>> that isn't easily automated, but I will reuse the key to other
>> machines). Surprise, it did not work, actually it transfered like 480
>> mb before being killed and was 10x slower than http. Looking at
>> .bzr.log you can see the process was still only fecthing data. So
>> simple bzr branch was out of the question.
>> Then someone said to bzr branch -rN where N is a small number to see
>> if that would work. I tried to get the speed back so tried http again,
>> apparently it tried to download the whole repo again so I stoped it to
>> try ssh. SSH worked but then I would have to manually split the
>> download so bzr doesn't eat the whole memory. That is way too manual
>> to my taste so I gave up on that also.
> SSH by itself isn't any faster, but the fact that you're running the bzr
> smart server protocol over SSH. The smart server protocol (HPSS) is
> optimized for bzr operations. If you use a "dumb" transport, like plain
> HTTP, FTP or SFTP then you're basically reading the raw remote files and
> fetching whatever bits you need by parsing them. This can involve loading a
> (part of) a large pack file into memory. With the smart server protocol,
> your client can just ask for specific data.

Any reason why it is using 600mb of ram to fetch a repository even
when using a smart server?

> Launchpad doesn't have smart server support enabled over HTTP, so if you try
> to clone a branch over Launchpad using HTTP it will use "dumb" access. I
> think the reason bug 493389 was closed is because it makes more sense to
> allow smart server protocol access over HTTP or plain TCP than to allow
> anonymous SSH (but I can only speculate).

I can only speculate that allowing a anonymous access (or even more
simply, just provide a downloadable private key to a fake user) is
easier than putting smart server over http.

> I don't think using -rN will help if you're using "dumb" protocol access to
> retrieve the branch; with smart protocol access (like bzr+ssh://, bzr:// or
> http:// with the smart server enabled), it will help.

Seems like what I got.

>> Talking about a huge repo, this one has 493mb and around 40k rev. I
>> used fastexport to see how big it would be in git and another bad news
>> it gets a tad smaller there 401mb after a git repack -a -d -f -F
>> (don't ask about all those flags, git cli is crazy). This goes against
>> the benchmarks posted on bzr, should they be updated or something?
> After repacking the bzr repository I it goes down from 490 to 450 Mb here.
> That's still not the same result as Git, but it's in the same ballpark. I'm
> sure you could find examples either way where one is slightly more efficient
> than the other.

I didn't know there was a way to repack bzr repos :). Is there any way
to ask launchpad to do the repack?

>> Why does bzr uses so much memory do to a simple branch, and is
>> --stacked the best way to do source deployment?
> I think --stacked is probably fine to do deployment. I personally would use
> lightweight checkouts ("bzr co --lightweight") or the bzr-upload plugin
> (doesn't require bzr to be installed on the server) to do deployment.

what is the difference between bzr co --lightweight and bzr branch --stacked?

> Jelmer

Thanks a lot... lets hope 2.5 lands soon on ubuntu :)


Leonardo Santagada


Leonardo Santagada

More information about the bazaar mailing list