Naive questions re hard-linking repositories

Martin Pool mbp at sourcefrog.net
Wed Apr 15 08:29:05 BST 2009


2009/4/15 Robert Collins <robertc at robertcollins.net>:
> On Wed, 2009-04-15 at 16:22 +1000, Martin Pool wrote:
>> 2009/4/15 Ian Clatworthy <ian.clatworthy at canonical.com>:
>> > Given it takes ~ 4 minutes to branch Emacs outside a shared repo
>> > and 6 seconds to branch within one, I'd like to better understand
>> > why we don't just hard link the .bzr/repository directory when
>> > conditions permit it, e.g. both source and target branch are local
>> > and on the same filesystem say.
> ..
>> However, we could plausibly hardlink the pack files within it.
>
> We could, except on Windows where hardlinking is less often supported,
> and the GUI environment doesn't understand the at all.

I'm given to understand it does work on NTFS.  If in some environments
it fails with "not supported on this filesystem" it's easy enough to
fallback to copying.

Presumably GUI tools should never or very rarely be looking around
inside the repository directories, and even more rarely trying to
change them.  So unless they actually cause the tools to blow up,
linking the files seems harmless.  fwiw we're told that hg does use
hardlinks on Windows and they apparently work.

>> I think we need to look at this at several levels (in descending order):
>>
>> 1- how does "I want a new branch and working area" map into the bzr
>> model, and in particular does it create a new repository and copy the
>> data, or make a stacked branch, or something else?
>>
>> 2- if you are copying all (or most) of a repository's content locally,
>> should you walk the whole graph and transfer the data semantically, or
>> should you just copy the repository's packed-up form similar to cp -r?
>
> I'm really against skipping reading the content; we don't know the
> providence of a local repo: it might have been gotten out of a tarball,
> or a damaged disk.

It's true it might be; of course this is also true for every operation
we do and they're not all as careful as branch currently is.  (For
example commit and status assume that the dirstate is consistent with
the repository's version of the revision it purports to describe.)  So
I'd like to identify what the principle is.  It's not "always
thoroughly check all relevant data", because we clearly don't: if we
did that we would have had some more obvious failures in the case of
some network bugs, but networking would probably also be much slower.
I also think that branch itself doesn't check everything it could
possibly check.

So we already make tradeoffs of being less likely to detect external
corruption vs speed of operation, but in the case of local branch
across repositories we probably don't get the tradeoff most users
would prefer.

I think it would be reasonable to at least have the option.

>> This has some disadvantages compared to having a shared repository,
>> because they're only sharing storage at one point in time: once they
>> start to diverge or if one of them is repacked, they'll start using
>> more disk space.  Still, it will have saved space at that one
>> particular point in time, and future access should be no slower than
>> it would be.
>
> It will be suprising to people when the first repack operation happens,
> or if they run 'bzr pack'. It's much more efficient to be using a shared
> repo, and I think focusing on that is a better way to address the
> issues.

Well, the first repack should be no slower, and probably will cause no
noticeable impact on disk use.

It will be a problem though if people have many branches of a large
project, which gradually diverge to the point where they're using
nearly N*M disk space.  To fix that we do need to address the level 1
problem of guiding people towards having just one repository.

If someone gets up steam to make bzr (optionally?) hardlink pack files
and the patch is reasonably clean and not excessively risky I'd
consider merging it.  I think fixing the higher level problem is
probably a better use of time though.

> Did you see my proposal about changing branch? It got disappointingly
> small amounts of feedback.

I'll look for it; I might have been away.  I'll try to catch up soon
on that and the easy setup stuff.

-- 
Martin <http://launchpad.net/~mbp/>



More information about the bazaar mailing list