[PATCH] more work on merge

Mon Jul 11 18:49:47 BST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
>>Wouldn't another way of solving the history shortcut problem be to
>>always record the longest path from CANDIDATE to THIS/OTHER, and then
>>take the candidate with the shortest longest path?
>>
> 
> 
> Yes and no. The problem is that with multiple lines, you need to have
> some way to stop iterating all permutations.

Well, let's look at the utility first, then look at the practicality.
Are there cases where shortest longest-path gives a bad result?

> See my earlier post where
> with a simple 13 revisions, I generated 60 that needed to be searched.
> If I find a longer route to a revision, then I need to update all routes
> out of that revision, since there is now a longer route as well.

Okay, so one approach is to keep a tree in which each node is affixed
only to its most-distant descendant.

So as you're iterating through history breadth-first, each time you hit
an already-seen node, you steal it and make it an ancestor of your node
only.  (You know that the current node is farther than or equal to the
previous one, since you're going breadth-first).

Since you've already seen this node, you don't need to traverse it.
Once the tree is completely built, you do another traversal to find the
distance values.  This should scale O(n) with the number of nodes in the
tree.

> The only thing I can think of is that you could use BFS, and then if you
> encounter a node a second time, you immediately update any child entries
> that exist in the revision=>distance dictionary.

I believe the scaling on this is bad, because you have to update all
children every time you find a longer path.  However, the constant is
probably small.

>>Oh, but you can generate the inventory from the changeset and its base,
>>right?  Or just install the changeset's revision first?
> 
> 
> I can generate the *last* inventory XML in a multi-revision rollup. And
> I can generate all intermediate revision XML entries.

I am highly suspicious of the notion of storing just the revision XML
for a revision.  I like the idea that either you have all the data for a
revision, or you don't have a revision entry for it.

> If you wanted to generate intermediate inventory XML, then you need to
> include intermediate patches. Which negates the advantage of a roll-up.
> If you have to have all ancestry inventory XML, then just send a series
> of changesets.

More and more, I think that a series of changesets is more likely to be
useful than a roll-up.  Martin and Linus both like a series of small
patches rather than one big one, and I suspect that is common.

>>Extracting texts from the current storage format doesn't seem very
>>expensive to me.  We're only talking about modified files, and heck, we
>>can optimize by only grabbing one copy of each text.
> 
> 
> It isn't expensive to get the text, but you need to get the difference
> from it's previous version, so it means running diff for each modified
> file in the history.

I don't see this, or I'm reading it wrong.  We only need to compare the
leading candidates-- those that are not ancestors of other candidates.
In practice, I expect this is two or three, but frequently one.  And we
only need to compare files that differ between THIS and OTHER, which is
usually a small number.

So total comparisons is
len((this, other)) * len(leading_candidates) * differing_files(this, other)

This may frequently resolve to 4. (2 leading candidates, 1 file changed)

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFC0rE70F+nu1YWqI0RAiZnAJ9Vm4r9iV2v/+618pARA9ExtFq6fACfRW46
3hqS7TEarLIffa/cjlsbnts=
=FqLZ
-----END PGP SIGNATURE-----