[RFC] Non-memoisable BreadthFirstSearcher

Fri Apr 25 04:23:36 BST 2008

On Thu, 2008-04-24 at 21:39 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I've been working on the find_differences/find_unique_ancestors code,
> and I'm finding that quite a few of the changes to _BreadthFirstSearcher
> made it slower.
> 
> For example, in "start_searching()" there is a call to _do_query(),
> because it wants to know if there are any ghosts.
> 
> However, if I remove that _do_query() call, the average search time
> drops from 0.097s => 0.063s.
> Or almost 30% of the time is spent in that _do_query().
> 
> Some of that might be how I've structured my code. Specifically, I call
> start_searching() with the assumption that any nodes that have have
> already been searched will be trivially discarded. (I did that, because
> otherwise I have to add another parent graph search to find what nodes
> are trivial to discard.)
> 
> Anyway, I can't say that I fully understand Robert's changes, to make
> sure that I don't break anything that he was trying to do.
> 
> So my feeling was to write a new searcher, which would just be the bare
> minimum. I would also even consider having some aggregate search
> objects, because there is a bit of overhead in walking the same graph
> nodes multiple times.
> 
> Part of the problem is that I'll have up to 30 or so searchers going.
> And that is because the only way to know that you've checked enough
> common revisions is when all common search revisions are ancestors of
> *all* non-common revisions. So you have to search a large set of the
> unique revisions (the ones that aren't trivially ancestors of others).
> [This is better for find_unique_ancestors, but it really hurts
> find_differences() on the bzr.dev code base].
> 
> Then again, if everything is fully tested that Robert did, after my
> changes all the tests still pass.
> 
> Thoughts?
> John

I don't think we need two searchers; the searches should perform
identical query counts if used as they were before my changes. There are
two lookup strategies - pre result and post-result. It sounds like
you're using post-result.

Also, I don't understand why you'd ever need more than 3 searchers.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080425/e9d31178/attachment-0002.pgp