Caching 'dotted_revnos' and merge_sort
John Arbash Meinel
john at arbash-meinel.com
Thu Apr 8 03:38:29 BST 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
>> Basically I'm wondering if the benefits of dotted revnos justify the
>> costs (both in existing runtime, and in developer effort to optimise
>> that runtime)?
>
I should also mention a few other things.
1) Currently dotted revnos are a trivial overhead on top of
'merge_sort'.
2) Merge_sort gives us nice graphs, and IMO a very nice way to display
revisions. Both graphically, and in plain 'bzr log'.
3) We could change the numbering algorithm to something that is a bit
easier to compute. Numbering from the merge point, vs numbering from
the start point. It is easier because:
a) merge_sort *sorts* them to be near the merge point. Thus the
numbering matches the clustering.
b) we store the parent map, but not the child map. Thus easier to
walk in that direction.
c) A hypothetical 'child index' would always have to query every pack
file, because new revisions can be created at any time.
Even with the benefits, I'm pretty sure you still end up walking back
to the point that the revision originated, and you probably have to
check a fair amount of the history inbetween (it *could* have been
merged).
gdfo, index on children, and quick mainline lookups can make a lot of
that cheaper. (index on children lets you know quickly that of the
ancestry you've searched so far, none of it could have been merged
because there aren't any 'outbound' child links.)
I'm finishing up the 'lazy' merge_sort code in the plugin now. I have
some basics working, but I think I'm missing some edge cases. (I just
found one where it was loading the whole ancestry when it didn't need
to, because my 'stop' criteria was bogus.)
Anyway, I'll let you know in a bit where I get to. I've found some
decent bounds on all operations. Currently dotted_revnos do have more
overhead than *just* merge_sort. (There are times when I can compute
what revisions where merged, and not yet know what numbers to assign
them.) But often it isn't far off.
Well, I need to finish making it correct, and then run it on more real
histories to see what bits I'm missing in my understanding... :) But at
least for cases I've analyzed the bounds are generally "you have to go
back to the originating revision, but no farther".
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAku9QaUACgkQJdeBCYSNAAObowCeIFXrMh8jPOf8MDSLZyUu0Nu/
9hwAnjWDtkNi/4GYwz670W+UNgQDpL63
=iv47
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list