Naive questions re hard-linking repositories

Robert Collins robertc at robertcollins.net
Wed Apr 15 08:02:03 BST 2009


On Wed, 2009-04-15 at 16:22 +1000, Martin Pool wrote:
> 2009/4/15 Ian Clatworthy <ian.clatworthy at canonical.com>:
> > Given it takes ~ 4 minutes to branch Emacs outside a shared repo
> > and 6 seconds to branch within one, I'd like to better understand
> > why we don't just hard link the .bzr/repository directory when
> > conditions permit it, e.g. both source and target branch are local
> > and on the same filesystem say.
..
> However, we could plausibly hardlink the pack files within it.

We could, except on Windows where hardlinking is less often supported,
and the GUI environment doesn't understand the at all.

> I think we need to look at this at several levels (in descending order):
> 
> 1- how does "I want a new branch and working area" map into the bzr
> model, and in particular does it create a new repository and copy the
> data, or make a stacked branch, or something else?
>
> 2- if you are copying all (or most) of a repository's content locally,
> should you walk the whole graph and transfer the data semantically, or
> should you just copy the repository's packed-up form similar to cp -r?

I'm really against skipping reading the content; we don't know the
providence of a local repo: it might have been gotten out of a tarball,
or a damaged disk.

> 3- if you're copying the repository just as a bunch of files should
> you in fact make hard links rather than copying it.
> 
> The top level possibly gives the bigger win, but the bottom ones are
> arguably easier to change and more the topic of your mail.
> 
> I think it would be reasonable to have local branch just hardlink all
> the pack files and make a new repository.  We would still want an
> option, maybe --precise, that walks over the graph, validates it and
> copies only what's strictly needed, but that need not be the default.
> If they can't be hardlinked (eg because of a filesystem limit) then
> you could just copy them.  So the lower bar time for these should be
> the time to do 'cp -r' or 'cp -rl' of the repository directories, plus
> building the tree.
> 
> This has some disadvantages compared to having a shared repository,
> because they're only sharing storage at one point in time: once they
> start to diverge or if one of them is repacked, they'll start using
> more disk space.  Still, it will have saved space at that one
> particular point in time, and future access should be no slower than
> it would be.

It will be suprising to people when the first repack operation happens,
or if they run 'bzr pack'. It's much more efficient to be using a shared
repo, and I think focusing on that is a better way to address the
issues.

> It would take a little care to do this in a clean way, and it would
> mean there's another code path by which data is copied between
> repositories, therefore the possibility for more testing or different
> bugs, but the improvement is potentially quite large.  Unless someone
> else sees a problem you could try to do a patch for it.

We had it in the past, it got removed by virtue of a lot of work, lets
not roll back the clock.

> > More broadly, I guess I'm asking us to revisit our assumptions about
> > what branch must do vs what it does now. Shared repositories are
> > cool but we ought to have a system that benefits from them, not
> > *requires* them for acceptable performance. I don't have the answers,
> > or even all the questions, so I thought I'd start back at the basics ...
> 
> So up at level #1, the question is why does that branch command even
> need to think about copying all the data when the user story is "I
> want a new logical branch and working tree."
> 
> Early versions of bzr took the approach that a directory holds a
> working tree, a branch pointer, and a repository with the history of
> that branch.  This concept is still in bzr's dna and the defaults are
> oriented towards it.  However, the actual recommended mode at the
> moment is: make a shared repository, then make branch directories in
> there, typically all with working trees.

Did you see my proposal about changing branch? It got disappointingly
small amounts of feedback.

> Whether you prefer one or the other, the fact that bzr is still
> oriented towards a mode of operation that isn't what we generally
> recommend is a problem.  I think this lies behind complaints that
> there are too many ways to use bzr.  Having the options is not a
> problem so much as the lack of a clear normal method on which both the
> community and the tool agree.

Agreed.

-Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090415/1b8684ea/attachment.pgp 


More information about the bazaar mailing list