Incremental merge_sorted?
Robert Collins
robertc at robertcollins.net
Mon Jun 26 15:08:59 BST 2006
On Sat, 2006-06-17 at 18:49 -0400, Aaron Bentley wrote:
>
> I think the biggest problem is that the algorithm requires that a
> merged
> revision appears only the first time it was merged. I think that
> requires examining the whole graph, if you're iterating through the
> whole graph.
To decide where a particular node should be output requires knowing if
it is destined for a branch to the left of this branch or not.
I.e. if you have:
A:B,C
B:D
C:F
D:E
E:F
breadth first gives you this examination order:
A,
B,C
D,F
so you hit F before you can tell that it should appear on the mainline
(the left most branch). That is, F is not in the set difference of A and
B.
Solving this is hard. Set differences are only cheap if you have the set
precalculated, and we need many sets: one for the mainline, and one for
every relative mainline.
There are somethings that can help (that are not implemented yet)
* knowing the dominators a-priori helps, because you know that you can
stop at a dominator rather than sucking up the whole graph to determine
what branch things should be assigned to.
* knowing how many mainline revisions are desired can help: we can walk
that many mainline revisions down, then do a breadth first walk to
gather the full set of ancestry for that point down to the next
dominator, and then resume the normal algorithm.
* possibly there are other things we can do to improve the locality of
reference of merge_sort.
The problem resolves around the concept we've talked about a lot before
of 'history shortcuts' - it is those shorter paths, like ACF above, that
cause 'grab minimal history' algorithms to be very difficult to get
right.
> But if you're iterating backwards through part of the graph, you can
> cheaply test whether the node is among the ancestors of your oldest
> revision, so you can quickly throw out nodes you'll never use.
I'm not sure I follow this bit.
To make merge_sort faster, I would start by:
* caching the method symbols to avoid lookups during the inner loop.
* using slots to avoid attribute lookups.
These are safe, simple, and effective changes to make to such a
performance sensitive piece of code.
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060627/f62facea/attachment.pgp
More information about the bazaar
mailing list