find_difference shortcut failure without walking out
John Arbash Meinel
john at arbash-meinel.com
Sat Dec 15 02:09:10 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
> John Arbash Meinel wrote:
>> Hmm... maybe we can do the current search, and then once it considers itself
>> "done" it can do another search starting with the uncommon nodes.
>
>> The other problem is that all of the common tips have to be ancestors of *all*
>> uncommon nodes. Not just *any* uncommon node.
>
> Right. Which means if you're unlucky enough to have an uncommon ghost,
> you have to search the entire graph. But as far as I can tell, that's
> *right*; the ghost could be a descendant of the root node.
Yeah, and we have 2 ghosts and quite a few root nodes. And when you hit an
uncommon root node, it triggers a search through the rest of the graph. Which
is unfortunate, but again it is "correct". You don't know that:
A --- F
/
G
Isn't
G --- F
\ /
---
>
>> 1) Perform the current search, which gives a superset of uncommon nodes, and a
>> good starting set for the common nodes.
>
>> 2) For uncommon nodes, find the most restrictive nodes, ie the ancestors of
>> uncommon nodes.
>
> I was going to suggest that, but you beat me to it.
>
>> The problem is really about having the "global ancestors" of all uncommon
>> nodes, since it almost seems defined as another "_find_border_ancestors"
>> search, but I don't really want to have 1000 BreadthFirstSearchers when there
>> are 1000 uncommon nodes. Anyway, it at least gives me some stuff to start with.
>
> Well, hopefully, by using the most restrictive nodes, you get away from
> that.
>
> Aaron
Yeah.
At the moment, I'm running another test against all merge nodes in bzr.dev, to
see if get_ancestry() gives the same results as this code. I was building up my
own ancestry from get_revision_graph(), but that doesn't have ghosts. And all
of our "_with_ghosts()" apis are a bit weak.
repo.get_revision_graph_with_ghosts().get_ancestors() is okay, but it doesn't
give you nodes for ghosts, it just references them from the child nodes. Which
means you have to go to all the values, rather than just set(...keys())
At this point, it looks like my code is correct, but just slow. I think our
ancestry is such that we have to hit the full graph a bit too often.
The first one that showed up was the "A" revision, where Robert accidentally
committed a merge of a revision with id "A". (test-case gone bad, IIRC.)
As I think I have the algorithm "correct" now, I'll go back and try to optimize
the current form. I think I know a few places I can tweak. But in the end, if
we have to search the bulk of the graph because we have a shortcut to 'null:',
then that is going to hurt us.
But thanks for the idea. It does seem like a "correct" solution. Though the
GDFO might perform a lot better with the cache.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHYzdGJdeBCYSNAAMRAufPAKCQQmyd4OaM2fl8qyKSBf5bCdH6MQCdFEDd
ggM0kFtFWxi8Xy/JwenkjFo=
=eZrd
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list