find_difference shortcut failure without walking out

John Arbash Meinel john at arbash-meinel.com
Thu Dec 13 16:01:13 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> Aaron Bentley wrote:
>> John Arbash Meinel wrote:
>>> The only algorithm I can think of off-hand, is the Greatest Distance From
>>> Origin that Aaron and Robert talked about in London. Basically, every node is:
>>> max(parent_distances) + 1.
>> It seems to me that the problem here is finding the right stop
>> condition; stop too early and you incorrectly determine nodes to be
>> "uncommon", stop too late and you waste resources.
> 
>> So when can we stop?  When it's impossible that we can determine any
>> more nodes to be common.  Since ancestry is a DAG, we know that no
>> descendant of a node can also be its ancestor.  In order for a searcher
>> to find that a previously-uncommon node is common, it must search from
>> an ancestor of that node.
> 
> Okay, I think I swapped some terms here.  Let's try again.
> 
> So when can we stop?  When it's impossible that we can determine any
> more nodes to be common.  Since ancestry is a DAG, we know that no
> descendant of a node can also be its ancestor.  In order for a searcher
> to find that an apparently-uncommon node is actually common, it must
> search from an descendant of that node.
> 
> Therefore, when all search nodes are ancestors of all
> apparently-uncommon nodes, more searching will not find any
> apparently-uncommon nodes to be common.  Therefore, we can stop.
> 
> Aaron


I understood what you meant, and it does make sense. Now I just need to think
about how to code that up in an efficient manner.

Hmm... maybe we can do the current search, and then once it considers itself
"done" it can do another search starting with the uncommon nodes.

The other problem is that all of the common tips have to be ancestors of *all*
uncommon nodes. Not just *any* uncommon node.

I'll try to start with a "correct" implementation, and then see if I can
optimize it. But maybe these steps

1) Perform the current search, which gives a superset of uncommon nodes, and a
good starting set for the common nodes.

2) For uncommon nodes, find the most restrictive nodes, ie the ancestors of
uncommon nodes. If B is a child of A, then if nodes are ancestors of A then
they are ancestors of B (plus possibly more). This is sort of an inverse
"heads" function. We probably want to restrict it to "seen" nodes, since if we
have a few extra values here, we'll still get the correct answers.

At this point, keep walking the common walker, and intersect it with the common
ancestors of the uncommon nodes.

The problem is really about having the "global ancestors" of all uncommon
nodes, since it almost seems defined as another "_find_border_ancestors"
search, but I don't really want to have 1000 BreadthFirstSearchers when there
are 1000 uncommon nodes.  Anyway, it at least gives me some stuff to start with.

Thanks for mentioning it.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHYVdJJdeBCYSNAAMRAk3HAJ438BGNLdq1IjFmmmFubB03GUiTBACeIp8u
CMuPj7G79pD9TPqTiSZ+xxk=
=d0iN
-----END PGP SIGNATURE-----



More information about the bazaar mailing list