[MERGE] Implement hard-link support for branch and checkout

John Arbash Meinel john at arbash-meinel.com
Wed Jan 23 16:04:02 GMT 2008


Stefan Ring wrote:
> I'm sorry if this has been discussed before but it didn't immediately
> stick out when I searched the list.
> 
> Anyway, wouldn't it make sense to hard-link the repository as well? I
> would very much enjoy to have this feature. For Mercurial, the
> official way to clone a directory is "cp -al". I love that! I tried it
> with bzr when I started playing around with it (0.90) but it seemed to
> corrupt the repositories. Also, with the knitpacks format being more
> or less append-only, it should be fairly safe.
> 
> Or is it supported already maybe? I might have missed this one.

The problem with knits is that you end up with race conditions. We lock 
at the Repository layer, which means that 2 branches could lock at the 
same time, and end up writing to the same files at the same time.

Mercurial gets around this by always breaking the hard-link whenever it 
is going to update a revlog file. However, you can only reliably detect 
that on the local filesystem. And we support accessing branches directly 
over sftp and ftp. (There is no stat that returns the number of 
hardlinks on those transports.)

So we chose not to support hard-linked repositories for Knits.

We have shared repositories, which work a whole lot better in the 
long-term anyway. As you commit more and more, your branches get more 
and more diverged with Mercurial, rather than continuing to share the 
same storage space. Also branching from remote into a shared repository 
will get the same storage benefits. Rather than having to do a local "cp 
-al" and then pull to get your new branch to match the remote.


On that note, if you *are* using a shared repository, you could probably 
do "cp -al branch1 branch2" since it isn't hard-linking the repository. 
I'm not sure if you would end up with weird issues with the 
"branch.conf" files, as I haven't tested it at all. But the rest of the 
files I'm sure are atomically overwritten, so they are effectively 
"break hardlinks on write".

As for the new knitpack/--pack-0.92 format. They are perfectly capable 
of being hardlinked, since the repository files are write-once. However, 
you end up suffering the same divergence effect. As one branch decides 
to create new data, the other branch won't see it. Merging between the 
branches will start duplicating your data. Branching from another 
upstream won't share your local storage.

Honestly, shared repositories solve the problem a lot better. It is 
possible we will support a "bzr branch --hardlink" style flag for pack 
repositories because we can, and it shows up on benchmarks when people 
haven't taken the time to set up a shared repository. It isn't terribly 
high on the requirements, though.

So if you want to try using "cp -al" with standalone pack branches, it 
should be fine. And I think we'll be responsive to any bug reports you 
file on it.

John
=:->



More information about the bazaar mailing list