[PATCH] more work on merge

Sun Jul 10 20:08:42 BST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
> How about this, if you look at my current code, it traces back through
> history, and keeps a 'set'.
> 
> Instead of using a set, use a dictionary, mapping rev_id => distance to
> target.

So as it turned out, the problem I was having was just a problem
diagnosing a test failure.  The code I was looking at was fine, but
another piece of code was causing a test failure.

Here's my implementation of dictionary => distance mapping:

def iter_ancestors(revision_id, revision_source):
    ancestors = (revision_id,)
    distance = 0
    while len(ancestors) > 0:
        new_ancestors = []
        for ancestor in ancestors:
            yield ancestor, distance
            try:
                revision = revision_source.get_revision(ancestor)
            except NoSuchRevision, e:
                if e.revision == revision_id:
                    raise e
                else:
                    continue
            new_ancestors.extend([p.revision_id for p in revision.parents])
        ancestors = new_ancestors
        distance += 1

def find_all_ancestors(revision_id, revision_source):
    found_ancestors = {}
    for anc_id, anc_distance in iter_ancestors(revision_id,
revision_source):
        if not found_ancestors.has_key(anc_id):
            found_ancestors[anc_id] = anc_distance
    return found_ancestors

> Basically the same recursive (stack-based) search that exists in
> "get_revision_set" in my changeset or merge code, but when it pops
> something onto the stack, it also pops on the current distance.
> 
> Here is a rough adaptation of my merge code:
> def get_revision_set(branch, target_rev_id=None):
>     """Get the set of all revisions that are in the ancestry
>     of this branch.
>     """
>     this_revs = {}
>     if target_rev_id is None:
>         to_search = [(branch.last_patch(), 0)]
>     else:
>         to_search = [(target_rev_id, 0)]
> 
>     while len(to_search) > 0:
>         rev_id, distance = to_search.pop(0)
>         if rev_id in this_revs and this_revs[rev_id] <= distance:
>             continue
>         this_revs[rev_id] = distance
>         rev = branch.get_revision(rev_id)
> 	distance += 1
>         for parent in rev.parents:
> 	    p_id = parent.revision_id
>             if (p_id not in this_revs or this_revs[p_id] > distance):
>                 to_search.append((parent.revision_id, distance))
>     return this_revs

The results should be the same, but the implementation sure differs, eh?
 One thing worth noting is that I'm not using 'branches' but 'revision
sources', which may be a set of several branches.

> With this code, there is a distance from each revision to the target
> revision, defined as the minimum number of revisions to go inbetween.
> This probably would work for what you are thinking.

Yeah, I'm sure it would.

> I can't say it does the best for crossed history, and it still suffers
> from a simple patch applied late causing a shortcut history.

Could you elaborate on this problem?

> I suppose a different possibility is to track all distances to a
> revision, rather than just the shortest one, but that does imply having
> to trace all routes through history.

I don't think it's hard.  My code could change to:

    for anc_id, anc_distance in iter_ancestors(revision_id,
revision_source):
        if not found_ancestors.has_key(anc_id):
            found_ancestors[anc_id] = [anc_distance]
        else:
            found_ancestors[anc_id].append(anc_distance)

But it's already tracing through all history...

> Otherwise you could do some sort of "average" distance, rather than
> minimum distance.

Anyhow, I'm not clear how we could use this information to pick a better
base.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC0XI50F+nu1YWqI0RApjJAJ4m6TqDxXprrno2l9YyAdCoOqxDiQCdEOFX
8uF7z5Ui+tm1VnbWvAaM+no=
=mhd7
-----END PGP SIGNATURE-----