[ANNOUNCE] Example Cogito Addon - cogito-bundle
Linus Torvalds
torvalds at osdl.org
Fri Oct 20 21:57:27 BST 2006
On Fri, 20 Oct 2006, Aaron Bentley wrote:
>
> Agreed. We start by comparing BASE and OTHER, so all those comparisons
> are in-memory operations that don't hit disk. Only for files where BASE
> and OTHER differ do we even examine the THIS version.
Git just slurps in all three trees. I actually think that the current
merge-recursive.c does it the stupid way (ie it expands all trees
recursively, regardless of whether it's needed or not), but I should
really check with Dscho, since I had nothing to do with that code.
I wrote a tree-level merger that avoided doing the recursive tree reading
when the tree-SHA1's matched entirely, and re-doing the latest merge using
that took all of 0.037s, because it didn't recursively expand any of the
uninteresting trees.
But the default recursive merge was ported from the python script that
did it a full tree at a time, so it's comparatively "slow". But it's fast
enough (witness the under-1s time ;) that I think the motivation to be
smarter about reading the trees was basically not just there, so my
"git-merge-tree" thing is languishing as a proof-of-concept.
So right now, git merging itself doesn't even take advantage of the "you
can compare two whole directories in one go". We do that all over the
place in other situations, though (it's a big reason for why doing a
"diff" between different revisions is so fast - you can cut the problem
space up and ignore the known-identical parts much faster).
That tree-based data structure turned out to be wonderful. Originally (as
in "first weeks of actual git work" in April 2005) git had a flat "file
manifest" kind of thing, and that really sucked. So the data structures
are important, and I think we got those right fairly early on.
> We can do a do-nothing kernel merge in < 20 seconds, and that's
> comparing every single file in the tree. In Python. I was aiming for
> less than 10 seconds, but didn't quite hit it.
Well, so I know I can do that particular actual merge in 0.037 seconds
(that's not counting the history traversal to actually find the common
parent, which is another 0.01s or more ;), so we should be able to
comfortably do the simple merges in less than a tenth of a second. But at
some point, apparently nobody just cares.
Of course, this kind of thing depends a lot on developer behaviour. We had
some performance bugs that we didn't notice simply because the kernel
didn't show any of those patterns, but people using it for other things
had slower merges. Sometimes you don't see the problem, just because you
end up looking at the wrong pattern for performance.
> > So recursive basically generates the matrix of similarity for the
> > new/deleted files, and tries to match them up, and there you have your
> > renames - without ever looking at the history of how you ended up where
> > you are.
>
> So in the simple case, you compare unmatched THIS, OTHER and BASE files
> to find the renames?
Right. Some cases are easy: if one of the branches only added files (which
is relatively common), that obviously cannot be a rename. So you don't
even have to compare all possible combinarions - you know you don't have
renames from one branch to the other ;)
But I'm not even the authorative person to explain all the details of the
current recursive merge, and I might have missed something. Dscho?
Fredrik? Anything you want to add?
Linus
More information about the bazaar
mailing list