bazaar/mercurial meeting
Martin Pool
mbp at canonical.com
Tue May 23 05:44:02 BST 2006
On 22 May 2006, John Arbash Meinel <john at arbash-meinel.com> wrote:
> > | I'm not sure how much we can actually merge versus just having a good
> > | exchange of ideas between the two projects. But I would agree, if a
> > | merge is possible, it could provide a lot of great benefits.
> >
> > Well, it might be nice to grab their smart server, for example.
>
> Definitely. And just a discussion about what tricks they use to get
> things to go fast. We could probably pull in their binary diff engine if
> we find it significantly better. (Though we might also want to look into
> a binary Patience diff).
I think at least there should be a good exchange of ideas. It's
possible that we would all agree enough on ultimate goals and technical
approaches that we would merge into one project, though that couldn't
happen overnight.
As participants in this project it's easy to see all the points of
difference and reasons why anything like a merge would be hard. But to
an outside observer it may be yet another case where it seems a shame
there's so much duplicated effort between open projects (emacs vs xemacs
for example).
> One other thing that I've really liked while using 'hg' is that they
> present their revision identifiers as 12 character ids. Now, I don't
> know if this is a complete identifier, or if it is just a short-name for
> the real 40-digit sha1 hash. I'm guessing its the latter, because 12-hex
> digits seems like they would collide relatively easily. (its only 48 bits).
I think it is the first 12 characters of the SHA-1; these will typically
be unique but I imagine that if they do collide or if you want to be
safe you can give the whole thing. It's basically the same trick that
gpg uses in presenting key ids as 8 hex characters, and I think monotone lets
you enter any unambiguous prefix. It is quite tasteful.
I suppose we could do something similar for our revision ids; perhaps
matching any set of substrings (e.g. 'mbp-20060522'). We need to
at some point work out a good way to point to non-mainline revisions.
> I do find their code a little bit difficult to go through, since almost
> no functions are documented (especially what all the parameters mean).
There's definitely a cultural difference there. Also last time I looked
they didn't seem nearly as test-driven as bzr now is.
> I do like that their code has at least a little bit of i18n (all the
> strings are _() wrapped.), and I think it is something that we should do
> with bzr before 1.0.
I agree.
> demandload() is something that might be worthwhile. In my tests so far,
> I can cut the 'bzr rocks' time in half. But honestly a lot of our code
> depends on a lot of our other code. (I could only shave a little bit of
> time off of 'bzr root')
> With demandload we might factor out more of the code, so that branch.py
> doesn't have the implementation of every branch format, which might
> decrease the load time.
> But one of the big load time killers is actually cElementTree (which
> loads ElementTree). For example:
>
> 0.030s $ time python -c ''
> 0.030s $ time python -c 'import os'
> 0.033s $ time python -c 'import elementtree'
> 0.103s $ time python -c 'import elementtree.ElementTree'
> 0.125s $ time python -c 'import cElementTree'
>
> So we have an instant over head of >0.1 seconds any time we need to do
> any XML processing.
>
> Now, I did find that 'iterablefile' has an overhead of 0.1 seconds as
> well, because it unconditionally imports 'doctest'. Which is apparently
> very expensive.
>
> 0.125 $ time python -c 'import bzrlib.rio'
> after moving the import doctest to the bottom of iterablefile:
> 0.041 $ time python -c 'import bzrlib.rio'
>
> So there are probably still a few things like that where we could clean
> up the import structure, and save some startup time.
That'd be good.
--
Martin Pool
More information about the bazaar
mailing list