Brief article on benchmarks of Python repository with leading DVCSen

Fri Feb 13 16:06:39 GMT 2009

On 2009-02-13 09:14 (-0600), John Arbash Meinel wrote:

> I'd just like to point out that if you do the natural thing of:
>
>  git init
>  echo content >> foo
>  git add foo
>  git commit -m "create" foo
>  git mv foo bar
>  echo more content >> bar
>  git commit -m "move and add" bar
>
> I believe git's auto-detection becomes even less reliable. I realize
> the "workaround" is to commit inbetween. However, consider a
> refactoring, where you then need to change things like "#include
> <foo.h>" to now be "#include <bar.h>", etc. It seems pretty natural to
> modify the *content* at the same time that you modify the *tree
> shape*.

Git's rename detection does not work with this kind of toy examples
where the content is only a couple of bytes. However, Git calculates the
similarity and if the change is more than 50% (I think) of the smaller
file then it is detected as a rename. So even though you are correct
that the detection gets "less reliable" it works very nicely in the real
world where there is real content in the files. I think that in practice
there is rarely need to worry about that.

To make the toy example work we add a little less content:

    echo content >foo
    git add foo
    git commit -m create
    git mv foo bar
    echo more >>bar
    git add bar
    git commit -m "move and add content"

Now we can see the rename by lowering the similarity requirement to as
low as 37 percent:

    git show -M37
    git log -p -M37
    git diff -M37 HEAD~1..

> For git to infer a rename after-the-fact you generally have to commit
> unmodified texts, so that the sha1 exactly matches.

Not true at all. Just keep about half of the file's content the same or,
if you change it more than that, lower the similarity requirement with
-M option when producing diffs.