Brief article on benchmarks of Python repository with leading DVCSen

Fri Feb 13 15:14:04 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stephen J. Turnbull wrote:
> Talden writes:
>  > >> I would expect that most java shops will lean heavily towards explicit
>  > >> renames and folder tracking specifically because of the tight
>  > >> relationship between folder structure, file naming and code content.
>  > >
>  > > That is an argument _in favor_ of dealing with file names the same way
>  > > you deal with code content. Any decent VCS detect content changes with
>  > > a diff algorithm, and you still didn't give an argument to do it
>  > > another way for file names.
>  > 
>  > No, I meant what I said.  We NEED tracked renames.
> 
> Yup.  If you do this script (the echos should be exact, content
> matters) in a fresh directory,
> 
> git init
> echo content >> foo
> git add foo
> git commit -m "create" foo
> git mv foo bar
> git commit -m "move" foo bar
> echo more content >> bar
> git commit -m "add" bar
> 
> then "git diff -M HEAD~2 HEAD" does *not* detect the rename, while
> "git diff -M HEAD~1 HEAD" does.  This particular case can of course be
> done at zero cost (you can just check that the list of file SHA1s is
> identical across commit #2), but more generally you'd have to do a
> diff on every commit, and evidently git doesn't do that.  So the more
> commits, the less likely git is to detect a rename.  I can also
> imagine refactorings where the majority of content moves to a new
> file, and a human being must disambiguate whether the semantics are
> "rename a b; add a; edit" or simply "edit".
> 
> I don't see why tracking renames would be terribly hard to add to git
> (the implementation would be just a variant on reflog, for file names
> rather than branch names) and I can't see a performance implication.
> It's indisputable that git *does not* do it, however.
> 
> I still don't understand why this need for rename tracking doesn't
> lead to a demand for defun-level tracking, too.

I'd just like to point out that if you do the natural thing of:

 git init
 echo content >> foo
 git add foo
 git commit -m "create" foo
 git mv foo bar
 echo more content >> bar
 git commit -m "move and add" bar

I believe git's auto-detection becomes even less reliable. I realize the
"workaround" is to commit inbetween. However, consider a refactoring,
where you then need to change things like "#include <foo.h>" to now be
"#include <bar.h>", etc. It seems pretty natural to modify the *content*
at the same time that you modify the *tree shape*.

Some people only like to commit when 'make' passes. Others go further
and only commit when 'make test' passes. I'm personally okay with
committing a broken tree, though I usually try to add "(broken)" to the
commit message to make that sort of thing clear (and thus obviously do
it infrequently).

For git to infer a rename after-the-fact you generally have to commit
unmodified texts, so that the sha1 exactly matches. Certainly they could
do a heuristic of "nothing matches exactly, what matches the closest",
though there again the user's intent may differ from what was inferred.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmVjjwACgkQJdeBCYSNAAOf6gCgqOhrfxlijaP6zRyShGbTm/ct
2lEAoKeZNsYJf+6HetIsmhIboAxCMzUi
=zqAm
-----END PGP SIGNATURE-----