Naive questions re hard-linking repositories

Wed Apr 15 08:26:13 BST 2009

Martin Pool wrote:

> Well, you can't hardlink the directory because the OS won't let you.
> :-) <http://en.wikipedia.org/wiki/Hard_link#Limitations_of_hard_links>
>  I presume this is so that the directory tree remains a tree not a
> directed possibly-cyclic graph.  As trivia, some Solaris versions
> would let root hardlink directories, but this could cause a kernel
> panic.

Damn - my master plan foiled because 21st century OSs are too limited
in their thinking. Reality can be so frustrating at times, even when one
works for an OS vendor. :-)

> I think we need to look at this at several levels (in descending order):
> 
> 1- how does "I want a new branch and working area" map into the bzr
> model, and in particular does it create a new repository and copy the
> data, or make a stacked branch, or something else?
> 2- if you are copying all (or most) of a repository's content locally,
> should you walk the whole graph and transfer the data semantically, or
> should you just copy the repository's packed-up form similar to cp -r?
> 3- if you're copying the repository just as a bunch of files should
> you in fact make hard links rather than copying it.
> 
> The top level possibly gives the bigger win, but the bottom ones are
> arguably easier to change and more the topic of your mail.

Yes, my email made the mistake of putting potential solutions ahead
of the root problem.

> I think it would be reasonable to have local branch just hardlink all
> the pack files and make a new repository.  We would still want an
> option, maybe --precise, that walks over the graph, validates it and
> copies only what's strictly needed, but that need not be the default.
> If they can't be hardlinked (eg because of a filesystem limit) then
> you could just copy them.  So the lower bar time for these should be
> the time to do 'cp -r' or 'cp -rl' of the repository directories, plus
> building the tree.

Right. These options are a step forward but arguably still solving the
wrong issue ...

> So up at level #1, the question is why does that branch command even
> need to think about copying all the data when the user story is "I
> want a new logical branch and working tree."

> Early versions of bzr took the approach that a directory holds a
> working tree, a branch pointer, and a repository with the history of
> that branch.  This concept is still in bzr's dna and the defaults are
> oriented towards it.  However, the actual recommended mode at the
> moment is: make a shared repository, then make branch directories in
> there, typically all with working trees.

Right. Defaults matter because they say a lot about how a tool is
expected to be used in the common case. I rarely prefer git's UI over
ours but, going back to the root problem, I think their choice of
two separate commands - clone vs branch - is a wise one. The reality
is that Doing The Right Thing varies IMO w.r.t. "branching" from a
remote location vs branching from a local one. For remote, you nearly
always mean "get me my own copy of history" while local
branching is all about "start me a new line of development". The
yet-another-complete-copy-of-history in the local case *by default* is
somewhere between worthless and harmful (cause it consumes unnecessary
time & resources).

Just like pull vs merge vs update, the *intention* is different so it's
arguably good to have separate commands. For 2.0, perhaps "branch" ought
to always stack by default and "clone" ought to not stack, rather than
just be an alias for branch?

We can always stick with a single command, giving branch an option
something like --stack-if-local (and making that the default mode).
But I typically prefer explicit over implicit and I think it will help
new users to think about clone vs branch as achieving different things.

BTW, I also like Robert's idea of making a shared repo implicitly
but I'd like to think about it more.

> Whether you prefer one or the other, the fact that bzr is still
> oriented towards a mode of operation that isn't what we generally
> recommend is a problem.  I think this lies behind complaints that
> there are too many ways to use bzr.

And equally importantly, just using the tool in the simple, obvious way
performs terribly and chews disk space unnecessarily.

> I think a case where having some flexibility does make sense is that
> some people with large trees or slow build processes may prefer just
> one tree that switches around, whereas others that have many streams
> in process and modestly sized trees might like lots of checkouts.
> Supporting both is great but it should be clear how you get from one
> to the other.
> 
> And this basically leads in to
> <http://bazaar-vcs.org/DraftSpecs/EasyWorkspaceSetup>.

Right. So the usability goals needs to be:

1. working well straight-out-of-the-box

2. retaining our adaptability *but* improving the UI for getting to
   and changing between commonly used workspace models.

Easy. :-)

Ian C.