brainstorming - what to do with annotations
Aaron Bentley
aaron.bentley at utoronto.ca
Mon Sep 10 16:48:25 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> So I have my initial commit work committing mozilla in 1m48 wall clock
> time.
>
> On my laptop tar czf is 0m38. I'm aiming for twice that - 1m16.
>
> The 1m48 is reached in part by disabling annotation caching. I think it
> would be nice to find some way of still having fast annotations. e.g.
> mpdiffs give us annotations for free don't they?
Yes, in the common case. But:
1. Fulltexts will not be annotated properly (the current mpdiff
versionedfile format just treats as texts with no parents)
2. For texts with multiple parents, it will give an annotation, but it
may be an incorrect value: If a line was introduced in two different
lines of ancestry, mpdiff will attribute it to one of those lines, not
to the current revision.
3. The values will be incorrect if the build-ancestry is not the same as
the true ancestry.
So it's not a complete solution. It seems to me that cached fulltext
annotations are required somewhere, somehow. With my algorithm, we can
zip through tens of revisions quickly enough. It's when we get to
hundreds that performance becomes problematic.
Given that annotations use data we have already calculated, (i.e, text
comparisons), they ought to be cheap. Perhaps it's merely the
commingling of annotation and line that makes it expensive? But
annotations can be stored separately, and full-annotation-data doesn't
have to be synchonized with fulltexts.
> But what about ghosts
> and shallow repositories - do they still have full texts
I think every repository format needs fulltexts, so that it doesn't
scale O(n) with the size of history. But see above-- I think
full-annotations and fulltexts can be decoupled. Full-annotations might
happen whenever a text has a ghost parent. Or at least, annotations of
the lines introduced.
> and when
> there is a full text in an mpdiff stream is it annotated? Can we store
> annotations in a different way thats more efficient? ....
Some format that listed revision, followed by ranges, would probably be
much more efficient:
abentley at home-randomstuff: 2-3, 5-7
abentley at home-randomstuff: 1-1, 4-4
Etc.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFG5WdJ0F+nu1YWqI0RAhrWAJ90cys/SNtZdZZT1BL7yS7DiFfT1ACfaEEr
kYjCt6hrjuXc7khDx0DC+xM=
=DqUg
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list