Brief article on benchmarks of Python repository with leading DVCSen

Stephen J. Turnbull stephen at xemacs.org
Fri Feb 13 01:33:27 GMT 2009


Matthieu Moy writes:
 > Nicholas Allen <nicholas.allen at ableton.com> writes:
 > 
 > > Because name and content are so tightly coupled in
 > > Java it's best to be explicit about both and not just explicit about
 > > the changes to the content

Well, no, that's wrong AFAICS.  You can't *not* be explicit about name
changes per se, except by hiding the real name under an alias.  But
that's not what we're talking about here.

What Java needs is a way to link name changes to content changes when
the semantics are linked.  AFAICS all current VCSes leave that up to
the user.  Even in git you can use "git mv" if you want to be
explicit, although probably that's still heuristic (ie, represented as
a rm/add pair, which presumably would be confused if you did "git mv a
b; git add a; git commit" -- but of course "git mv a b; git commit; git
add a; git commit" disambiguates).

 > I'm repeating myself, but you are _not_ explicit about the change it
 > content with bzr. The changes in content are _detected_ after the fact
 > by bzr, with a diff algorithm. As any decent VCS would do.

No, decent VCSes don't use diff algorithms, they flag files with
changed content.  That way almost all changes that humans make will
lead to a perception of O(1) performance for $VCS diff (and related
operations).

The big difference with bzr is that presumably

    bzr diff -r1..3 file

has well-defined (and useful ;-) semantics if a mv involving "file"
occurred in rev 2.  "bzr help diff" doesn't say what they are, though.
Git OTOH tracks the *name* rather than the *content* of "file".  So

git commit
git mv a b
git commit
# change file c
git commit
git diff HEAD~2 -- a ==> diff showing entire contents of a removed
git diff HEAD~ -- b ==> diff showing entire contents of b added

while presumably what is desired is

git diff HEAD~2 -- a ==> "a renamed to b"
git diff HEAD~2 -- b ==> "b renamed from a"

and an empty diff in each case.  That is what bzr does (almost, bzr
uses "a renamed to b" in both cases).  git does have heuristics to
detect this (diff -M), but a serious refactoring probably would end up
with the programmer's intent being undetected.

What I don't understand is why this is so bloody important, when what
you'd really like to have is tracking of semantic units (defuns, at
least, maybe even blocks/sexps) at the sub-file level.  And nobody
does that (although it's a fairly straightforward extension of git to
do so).





More information about the bazaar mailing list