[rfc] bzr-colo into core

Thu Mar 24 11:57:34 UTC 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 3/24/2011 11:58 AM, Alexander Belchenko wrote:
> Martin Pool пишет:
>> I have looked a bit at how Mercurial does it. Thanks for explaining it.
>>
>> The repository and branch breakdown is fairly similar; bzr too has a
>> DAG that is pointed to by branch tips.  One difference is that I
>> believe hg generally hardlinks repository-internal files when you make
>> new repositories on the same disk, whereas bzr copies, and rather
>> wants to avoid having multiple repositories unless you really want
>> them to be separate.
> 
> I wish bzr can hardlink packs and indices while cloning repo locally.
> Actually we're at the better position here than hg: packs and indices
> are immutable, only pack-names should be different when you commit new
> stuff and clones diverge. Packing repo in one clone should not affect
> other clones I think, but maybe it's important to break links before
> moving old packs into obsolete area. And it should work on the Windows
> too (although Martin gz metioned some issues there).
> 
> Can we?
> 

On the flip side, bzr doesn't care if you have data in your repository
that isn't used by the branches. In Mercurial, any data in your
repository is always in a "this should probably be merged" state.

Specifically, if I do "hg pull" to grab some of your changes into my
local repository. But then I decide *not* to merge them, I have to 'hg
truncate', which actually rewrites the repo to remove your changes (all
operations that add data record the size of files before the addition,
and 'hg truncate' truncates the files to their original size.)

Otherwise it will keep warning you "you have heads that aren't merged in
your "default" branch".

I've certainly played around with symlinking .bzr/repository between
projects. And it works just fine. bzr repositories are generally
multi-writer safe and concurrent. (We currently have a bug when 2
concurrent writers fetch exactly the same content, but as long as the
content is different in at least 1 byte [eg revision, etc], everything
works fine.)

In Linux, you can't hard-link directories, but on windows Junction
Points only work on Repositories.

One of the interesting bits for hg repos, is that the more you use them
after clone, the more they diverge. Which can be significant if you have
long-lived clones. Any file which is touched, its full ancestry is
copied to add the new revision. So for files like bzr's old NEWS and
bzrlib/builtins.py which are both large files, and tend to have a lot of
changes, all of that history gets copied between each of your clones.

Also, if you have a bunch of active clones, and they fetch similar
revisions, all of that data gets duplicated because while they start
hard-linked they keep diverging.

The main thing I think we are missing, is that sometimes you use
disjoint directory structures (though for hard-links to work, they have
to be in the same mount point/hard-drive). We could certainly create a
"RepositoryReference" type, that just says "the real repository is
*over-there*".

And *that* would be worlds faster than even hard-linking the repository
files. :)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2LMa4ACgkQJdeBCYSNAAMOHQCg1GjUqXj6inYVh7DJXWWJUBzg
1esAn13bTQXsgqjwaF8whkvubkr9jqVB
=02fR
-----END PGP SIGNATURE-----