Incremental merge_sorted?

Robert Collins robertc at robertcollins.net
Mon Jun 26 15:08:59 BST 2006


On Sat, 2006-06-17 at 18:49 -0400, Aaron Bentley wrote:
> 
> I think the biggest problem is that the algorithm requires that a
> merged
> revision appears only the first time it was merged.  I think that
> requires examining the whole graph, if you're iterating through the
> whole graph.

To decide where a particular node should be output requires knowing if
it is destined for a branch to the left of this branch or not.

I.e. if you have:
A:B,C
B:D
C:F
D:E
E:F

breadth first gives you this examination order:
A,
B,C
D,F

so you hit F before you can tell that it should appear on the mainline
(the left most branch). That is, F is not in the set difference of A and
B.

Solving this is hard. Set differences are only cheap if you have the set
precalculated, and we need many sets: one for the mainline, and one for
every relative mainline.

There are somethings that can help (that are not implemented yet)
 * knowing the dominators a-priori helps, because you know that you can
stop at a dominator rather than sucking up the whole graph to determine
what branch things should be assigned to.
 * knowing how many mainline revisions are desired can help: we can walk
that many mainline revisions down, then do a breadth first walk to
gather the full set of ancestry for that point down to the next
dominator, and then resume the normal algorithm.
 * possibly there are other things we can do to improve the locality of
reference of merge_sort.

The problem resolves around the concept we've talked about a lot before
of 'history shortcuts' - it is those shorter paths, like ACF above, that
cause 'grab minimal history' algorithms to be very difficult to get
right.


> But if you're iterating backwards through part of the graph, you can
> cheaply test whether the node is among the ancestors of your oldest
> revision, so you can quickly throw out nodes you'll never use.

I'm not sure I follow this bit.

To make merge_sort faster, I would start by:
 * caching the method symbols to avoid lookups during the inner loop.
 * using slots to avoid attribute lookups.

These are safe, simple, and effective changes to make to such a
performance sensitive piece of code.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060627/f62facea/attachment.pgp 


More information about the bazaar mailing list