[ANNOUNCE] Example Cogito Addon - cogito-bundle
Shawn Pearce
spearce at spearce.org
Fri Oct 20 21:53:05 BST 2006
Linus Torvalds <torvalds at osdl.org> wrote:
> On Fri, 20 Oct 2006, Shawn Pearce wrote:
> >
> > I renamed hundreds of small files in one shot and also did a few
> > hundered adds and deletes of other small XML files. Git generated
> > a lot of those unrelated adds/deletes as rename/modifies, as their
> > content was very similiar. Some people involved in the project
> > freaked as the files actually had nothing in common with one
> > another... except for a lot of XML elements (as they shared the
> > same DTD).
>
> Heh. We can probably tweak the heuristics (one of the _great_ things about
> content detection is that you can fix it after the fact, unlike the
> alternative).
>
> That said, I've personally actually found the content-based similarity
> analysis to often be quite informative, even when (and perhaps
> _especially_ when) it ended up showing something that the actual author of
> the thing didn't intend.
>
> So yeah, I've seen a few strange cases myself, but they've actually been
> interesting. Like seeing how much of a file was just a copyright license,
> and then a file being considered a "copy" just because it didn't actually
> introduce any real new code.
Aside from that one strange case I just mentioned I've always seen
the strategy to work very well. Its never done something I didn't
expect and I've never seen copies or that I didn't expect to see,
knowing what the author of the change did.
So even though I had a little bit of trouble with that rename
situation above I'm _very_ happy with the way Git handles renames.
And the truth is that case above really was quite correct: XML is
very verbose. When 70% of the file is just required XML to frame
the other 30% of the file's payload its not surprising that files
are considered to be similar when they only differ by a little bit
of payload.
--
Shawn.
More information about the bazaar
mailing list