Caching 'dotted_revnos' and merge_sort

John Arbash Meinel john at arbash-meinel.com
Thu Apr 8 03:38:29 BST 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>> Basically I'm wondering if the benefits of dotted revnos justify the
>> costs (both in existing runtime, and in developer effort to optimise
>> that runtime)?
> 

I should also mention a few other things.

1) Currently dotted revnos are a trivial overhead on top of
   'merge_sort'.

2) Merge_sort gives us nice graphs, and IMO a very nice way to display
   revisions. Both graphically, and in plain 'bzr log'.

3) We could change the numbering algorithm to something that is a bit
   easier to compute. Numbering from the merge point, vs numbering from
   the start point. It is easier because:

   a) merge_sort *sorts* them to be near the merge point. Thus the
      numbering matches the clustering.

   b) we store the parent map, but not the child map. Thus easier to
      walk in that direction.

   c) A hypothetical 'child index' would always have to query every pack
      file, because new revisions can be created at any time.

   Even with the benefits, I'm pretty sure you still end up walking back
   to the point that the revision originated, and you probably have to
   check a fair amount of the history inbetween (it *could* have been
   merged).

   gdfo, index on children, and quick mainline lookups can make a lot of
   that cheaper. (index on children lets you know quickly that of the
   ancestry you've searched so far, none of it could have been merged
   because there aren't any 'outbound' child links.)

I'm finishing up the 'lazy' merge_sort code in the plugin now. I have
some basics working, but I think I'm missing some edge cases. (I just
found one where it was loading the whole ancestry when it didn't need
to, because my 'stop' criteria was bogus.)

Anyway, I'll let you know in a bit where I get to. I've found some
decent bounds on all operations. Currently dotted_revnos do have more
overhead than *just* merge_sort. (There are times when I can compute
what revisions where merged, and not yet know what numbers to assign
them.) But often it isn't far off.

Well, I need to finish making it correct, and then run it on more real
histories to see what bits I'm missing in my understanding... :) But at
least for cases I've analyzed the bounds are generally "you have to go
back to the originating revision, but no farther".

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAku9QaUACgkQJdeBCYSNAAObowCeIFXrMh8jPOf8MDSLZyUu0Nu/
9hwAnjWDtkNi/4GYwz670W+UNgQDpL63
=iv47
-----END PGP SIGNATURE-----



More information about the bazaar mailing list