find_difference shortcut failure without walking out

John Arbash Meinel john at arbash-meinel.com
Wed Dec 12 17:11:36 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So I did some testing of my version of the code on our bzr.dev history. And I
found that it still fails under certain conditions. This is a graph which shows it.

NULL_REVISION
    |
    a
    |
    b
    |
    c
    |
    d
    |\
    e f
    | |\
    | g h
    |/| |
    i j |
    | | |
    | k |
    | | |
    | l |
    |/|/
    m n


Steps:

   Walker A	Walker B	Walker Common
1) m		n		None
2) i l		l h		None + l
3) e g		f		k
4) d f		d		j + d f

At this point, all searchers have terminated in common nodes, and the common
searcher has not found 'g' yet. So we have found all the genuinely unique nodes
(m i e, and n h), but we also found 'g' which we think is unique, but it
actually is not.

We have some rather crazy graphs in bzr. For example, if you compare the
ancestries of the parents of:
aaron.bentley at utoronto.ca-20070111031041-cu4tmhma4sqjph48
It has to traverse over 2000 nodes before the uncommon nodes terminate, and
even then there is a shortcut that is triggering this condition. (Well, I'm
somewhat assuming that, as I haven't tried to actually graph 2k nodes.)

Anyway, I'm trying to think about ways to get around this. The cheap and
incorrect way is to just let the common searcher continue going for a few
rounds (10 worked for the ones I was testing). But since g-n can be arbitrarily
 long, that won't work in the real world (i just happened to use that while
testing to figure out why the set difference was giving different results).


One thing I considered. Get rid of the "common searcher". Instead, when
searchers find common nodes, they use "find_seen_ancestors" to jump to the tip
of the ancestry. And then search terminates once all searchers are on the same
nodes.

Something like this, perhaps:

   Walker A	Walker B	Common Nodes
1) m		n		None
2) i l		l h		l=>l is common, but no ancestors found
3) e g k	k f		k=>k (no ancestors)
4) d f j	j d		d=>d, f=>d, j=>d

At this point, all searchers are on the same node "d" and we can stop. I'm
still playing around with it, but it seems to work (at least in my head). I'll
try to write some code to test it at least. And then I'll go back and clean up
my patches.

But if someone wants to shoot holes in my idea before I get too far, I'm all ears.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHYBZIJdeBCYSNAAMRAtJkAKDJD2MfRhOibQcuXlx47q9Idm/yHACfVlIO
6EyfVus0XShJbbc+oH2LyVI=
=haXT
-----END PGP SIGNATURE-----



More information about the bazaar mailing list