[MERGE][1.0] Fixes for find_differences() and get_parents_map()

Sun Dec 9 02:19:45 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

These aren't strictly dependent on eachother, so I'd be happy to split them out
into separate patches. I also wanted them to be tracked separate from my
earlier heads() work so I'm submitting it as a cherrypick.

I guess the get_parent_map() is actually a more risky change (since people
implementing a parents provider won't have implemented it). So again, I would
be willing to break this into 2 separate submissions. It was just all done on
the same branch, so I'd have to move the stuff around a bit.

Anyway, this changes the _find_border_ancestor() code so that it knows when to
not stop walking the common_searcher. Basically because one of the active
searchers wasn't stopped because it found common nodes, but because it ran off
the edge of the graph. I'm not 100% sure that it is the most efficient
implementation.

The get_parent_map() wasn't very much of a performance improvement, it just
seemed better to me from an api-friction point of view. But I'm happy to pull
it out if people prefer.

This code should always give the correct values, but there is a graph case that
it is sub-optimal, but I think it is rare enough to not worry about.

Specifically, if you have 2 root revisions (like when you merge a plugin into
bzr.dev). If one branch needs to walk up the plugin ancestry, and the other
branch hasn't figured out that it is common yet, it will trigger the common
walker to recurse all the way to the full root. It does this because I don't
have a "oh, that was found common after all". The reason I don't have that is
because of issues with NULL_REVISION. Basically, by the time I've figured out
that I'm off the graph, I've gone through NULL which is common to all branches.

Something like this:
  NULL
  /  \
 A    A'
 |    /
 B   /
 |\ /
 C D
 |/|
 E F
 |
 G
 |
 H
 |
 I
 |
 J

Doing a find_difference() on the above graph will step as:

1) J F (no common)
2) I D (no common yet)
3) H A' (no common yet)
4) G NULL (no common yet)
5) E <off graph> (no common yet) triggers full searching
6) CD <stopped> (D common, eliminates D and A')
7) B (B common, stops searcher)

At this point, we *should* be able to stop, as we know that A' is common, and
we wouldn't need anything before B. (So imagine something like bzr.dev where
A-B is 1500 nodes long).

I went with this algorithm because it will handle ghosts as just another
"walked off the graph" case. My other fix for it (previous revision) imposed a
per-node check to see if it was "about" to walk off the graph (because it
encounters NULL). Also, to detect NULL_REVISION is difficult. The Knit
implementations return [NULL_REVISION], the pack ones return (NULL_REVISION,)
(lists versus tuples). So I can't just do "parents == (NULL_REVISION,)".

Doing the per-node check cost about 10% (5.5ms => 5.9ms). This form costs a lot
less (5.5ms => 5.6ms). Also, the previous form would have to have even more
logic to handle ghosts, etc.

Anyway, this patch will make Graph.find_difference() a safe api to use, and
still perform O(changes) rather than O(history) for most operations.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHW1DBJdeBCYSNAAMRAjB8AKDP/0g0Bh5IS2CoIKpQkVHT+F5IQwCdGHh7
Qz5mnN8s+gvGhStPd1qMJ9s=
=6N7n
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: graph_update_diff.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20071208/6af28bd8/attachment-0001.diff