bazaar/mercurial meeting
John Arbash Meinel
john at arbash-meinel.com
Mon May 22 18:00:15 BST 2006
Aaron Bentley wrote:
> John Arbash Meinel wrote:
> | Aaron Bentley wrote:
> |
> | I didn't realize that hg had repositories. Though I guess I just found
> | out that with 'clone -U' you can prevent it from 'updating the working
> | directory' ie, don't create a working dir.
>
> This is that "multi-head" thing people kept talking about, with a
> working tree that had several branches in it. It sounded crackful in
> those terms, but in Bzr terms, it's a repository with a working tree at
> its root.
Sure. I think monotone does something similar. Though maybe it is just
that their definition of 'branch' allows multiple heads.
We probably would consider a 'branch' more of a 'project', and each head
its own 'branch'.
>
> |>Any kind of merger would be challenging, but there would definitely be
> |>advantages:
>
> | Yeah. The only problem is the semi-fundamental model differences. hg
> | follows the svn model of copy + delete rather than rename, and doesn't
> | use the concept of individual ids per file. (Hence probably has
> | difficulty with merging after renames).
>
> Bzr can handle such a format though: as long as they have unique
> revision IDs, we can synthesize file ids, and convert delete+copy into
> rename.
Well, when I tried to use 'hg copy', it didn't seem to actually copy the
history over. At least I did this:
hg copy Makefile otherMake
hg rename hgmerge othername
hg commit -m "foo"
Afterwards, I saw new files:
.hg/data/othername.i
.hg/data/otherMake.i
And I could do:
$ hg debugindex .hg/data/otherMake.i
rev offset length base linkrev nodeid p1 p2
0 0 855 0 2253 ccde28a35792 000000000000
000000000000
$ hg debugindex .hg/data/Makefile.i
rev offset length base linkrev nodeid p1 p2
0 0 323 0 1008 06f2e0185921 000000000000
000000000000
1 323 55 0 1020 e81018fed6bd 06f2e0185921
000000000000
2 378 53 0 1423 e042b76d3e3c e81018fed6bd
000000000000
3 431 47 0 1426 0864563f2cf4 e042b76d3e3c
000000000000
4 478 68 0 2207 c9b5eb45aa74 0864563f2cf4
000000000000
5 546 128 0 2233 e886e0850217 c9b5eb45aa74
000000000000
6 674 84 0 2234 ef4dbb43259c e886e0850217
000000000000
7 758 248 0 2235 63f7feee55ef ef4dbb43259c
000000000000
8 1006 528 0 2244 e1fdebcdebf3 63f7feee55ef
000000000000
$ hg debugindex .hg/data/othername.i
rev offset length base linkrev nodeid p1 p2
0 0 1985 0 2253 56084c1012e4 000000000000
000000000000
So their magic must be in the manifest, and not in the data files.
I can also say, though that 'hg annotate' shows only the last revision
for every line of both 'othername' and 'otherMake'.
While this might be a limitation in 'hg annotate', it looks a whole lot
more like they are just creating a new whole-text version, and not
relating either one to the previous history.
I can also say that doing 'hg diff -r 2252 -r 2253' shows everything as
pure add and deletes. (This may be a desire to be 'patch' compatible,
though).
Anyway, just to say that unique file ids may be difficult to come by. I
don't see any case where 'hg' actually records a rename, or even a copy.
Oh, and 'hg log othername' also only shows the last revision. While 'hg
log hgmerge' shows lots of revisions (even though 'hgmerge' doesn't
exist in the working directory anymore).
Maybe I'm missing something, but to me, all of this points to the fact
that hg doesn't actually track renames in any way. (The best you could
do is git style, where you see an add and delete, and do text
comparisons to see what files might have been renamed).
>
> | On the flip side, it supports
> | copying history, so that you can 'split' a file, and still retain the
> | ancestry for each line.
>
> Unfortunately, no one's come up with sane semantics for merging with
> copied files, so far as I know.
Yeah. There are lots of issues that we have discussed. (going along with
our discussions about id aliasing).
>
> | If it wasn't for that, I was curious if we could have actually used hg's
> | storage. (It would be slow for annotations, but is obviously fast for
> | other things).
>
> We actually did have a revfile implementation in our source tree, at one
> point, so it's something we've looked at. Knits are a kind of
> weave/revfile hybrid, so I believe they could be competitive with revfiles.
Well, now they use 'revlog' which actually will put the data inside the
index if the data is small enough (I don't know what this threshold is).
I remember diving into revfiles, and that we could have used it.
AFAIK, we decided against it for the reasons I gave earlier (fixed size
records, expected fixed size revision-id, no annotations, etc).
>
> | I'm not sure how much we can actually merge versus just having a good
> | exchange of ideas between the two projects. But I would agree, if a
> | merge is possible, it could provide a lot of great benefits.
>
> Well, it might be nice to grab their smart server, for example.
>
> Aaron
Definitely. And just a discussion about what tricks they use to get
things to go fast. We could probably pull in their binary diff engine if
we find it significantly better. (Though we might also want to look into
a binary Patience diff).
One other thing that I've really liked while using 'hg' is that they
present their revision identifiers as 12 character ids. Now, I don't
know if this is a complete identifier, or if it is just a short-name for
the real 40-digit sha1 hash. I'm guessing its the latter, because 12-hex
digits seems like they would collide relatively easily. (its only 48 bits).
I'm also curious about hg's manifest format. While I don't think it
records everything that we want, I do think we should look into
alternatives to our current XML formats.
I do find their code a little bit difficult to go through, since almost
no functions are documented (especially what all the parameters mean).
I do like that their code has at least a little bit of i18n (all the
strings are _() wrapped.), and I think it is something that we should do
with bzr before 1.0.
demandload() is something that might be worthwhile. In my tests so far,
I can cut the 'bzr rocks' time in half. But honestly a lot of our code
depends on a lot of our other code. (I could only shave a little bit of
time off of 'bzr root')
With demandload we might factor out more of the code, so that branch.py
doesn't have the implementation of every branch format, which might
decrease the load time.
But one of the big load time killers is actually cElementTree (which
loads ElementTree). For example:
0.030s $ time python -c ''
0.030s $ time python -c 'import os'
0.033s $ time python -c 'import elementtree'
0.103s $ time python -c 'import elementtree.ElementTree'
0.125s $ time python -c 'import cElementTree'
So we have an instant over head of >0.1 seconds any time we need to do
any XML processing.
Now, I did find that 'iterablefile' has an overhead of 0.1 seconds as
well, because it unconditionally imports 'doctest'. Which is apparently
very expensive.
0.125 $ time python -c 'import bzrlib.rio'
after moving the import doctest to the bottom of iterablefile:
0.041 $ time python -c 'import bzrlib.rio'
So there are probably still a few things like that where we could clean
up the import structure, and save some startup time.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060522/526189c4/attachment.pgp
More information about the bazaar
mailing list