Brief article on benchmarks of Python repository with leading DVCSen

Fri Feb 13 02:24:00 GMT 2009

>> I would expect that most java shops will lean heavily towards explicit
>> renames and folder tracking specifically because of the tight
>> relationship between folder structure, file naming and code content.
>
> That is an argument _in favor_ of dealing with file names the same way
> you deal with code content. Any decent VCS detect content changes with
> a diff algorithm, and you still didn't give an argument to do it
> another way for file names.

No, I meant what I said.  We NEED tracked renames.  It is necessary to
be able to trace the history of a class and file names, locations in
the tree and contents change during refactoring.  Changing the package
changes the package statement, often changes imports and sometimes
changes invocations within the file (and rarely does the refactor
occur just to rename so naturally there are content changes too).
Being able to trace back in history through several rename/moves has
been invaluable often - failure to merge forward through the renames
has been equally painful, often.

Just as I appreciate that the Python team need better performance
whilst retaining the freedom of their users, this is an actual need
and you've demonstrated my point by denying the need exists.

Yes we could do without it and either place notes in files (though we
then couldn't use the IDEs to do the refactoring changes for us) and
could have sift through history but then why would we want to when we
can have rename tracking.

Diff is not the point here.

>> There's no point arguing
>> * Git doesn't track renames - if that's important to you, git isn't
>> for you
>
> _Tracking_ renames is important for no one. _Managing_ renames is
> important, and Git is very good at that.

For some definition of 'very good'. Detecting renames, if it could be
correct and consistent would be fine.  git can't.  Yes I've tried it
(albeit some time ago - have they added magic recently).

>> The Bazaar team is working hard to improve performance and make as
>> much of the workflow a personal choice - I can't say I see the same
>> from Git, there it seems to be more an attitude of "if git doesn't do
>> it, you don't need it"...

> The attitude of Git is to refuse arguments like "in theory, it would
> be better to do X, so Git should do X". But come with a real-life
> use-case where you need something, and the community is then _very_
> responsive.

Actually I intended to mean that if renames are important don't use
git - not that git must change to support renames.  There may be
unassailable performance trade-offs in supporting renames suggesting
that, for those needing bleeding-edge performance, adding rename
support to git would be bad.

>> Oh and this example of a weakness in bzr.
>>
>>>  bzr rm --keep file
>>>  # Oups, my bad, this is not what I wanted.
>>>  bzr add file
>>
>> Surely this is just a UI bug. bzr could easily detect replaces (an rm
>> and an add with the same path) and check the contents for a match and
>> treat the add as a revert of the rm (a "--force-replace" option on add
>> could support the current behaviour).
>
> Notice how you use the word "detect" here is a positive way, and how
> you consider the exact same to be negative for Git.

I didn't truly mean to denegrate the word detect.  I did mean to
denegrate the uncertain rename detection as a replacement for rename
(it is an alternative though if renames are less critical).

I expect that in this case detection could be somewhat more absolute,
given the limited number of variables, or at least could usefully warn
for the edge cases.

EG rm --keep file, hack hack, add file.

Hmm did they mean 'revert + modify' or 'replace').

> Once you fixed the above, try to fix also
>
>  bzr rm --keep file
>  bzr commit
>  # oops.
>  bzr add file
>
> and the "someone used plain patch to add a file" scenario.

This one is more troublesome and is part of the rename/guess
trade-off.  I don't know the bazaar internals well enough to know if
file-id recovery tricks might be an option (wait, does it already have
that).

Naturally the expressiveness of patching isn't great so there's always
going to be some user work in expressing their real intent prior to
commit if they want to resolve this (detection in add could do some of
the work and make recommendations).

> Regardless of performance, as someone else said in the thread, I'd use
> Git anyway for all the features bzr doesn't have (the staging area,
> "rebase -i", the reflog, ...).

core support for rebase would be nice (along with selectively
squishing revisions and re-editing the commit message), I'm less
excited by the staging area other than for the performance advantage.
Co-located branches would be nice as would GC features for the
repository and filtering/obliterating history (yes I know this breaks
other branches but sometimes it's what you want).

There's no shortage of features I'd like bzr to have - I'm still
currently preferring bzr over git for features though (you really do
underestimate how much pain the lack of an explicit rename causes some
teams).  Performance and space are another matter, performance because
bzr's still too slow for some tasks (but getting there) and space
purely because selling the DVCS option to the company means doing
painful comparisons with Subversion (despite how cheap consumer disks
are)...

> AFAIK, Bzr doesn't have support for copies, while Git has (for
> example, "git blame" will track code movement and copies between files
> like no other VCS do as of now). It may have changed recently, I don't
> know.

No it doesn't support copies.  We've had a few cases where they would
have been nice in Subversion but since the svn merging doesn't follow
copies forward it's only of use for saving space to us right now.  Man
do I hate Subversion merges.

>> NB: Someone in our team is bitten by the lack of atomic rename in
>> Subversion at least once a month.  The changes are such that Git is
>> unlikely to have detected it after the fact.
>
> And can a merge still happen in the files after such huge
> modifications? I mean: did you do a diff3 or diff+patch manually and
> see the amount of conflicts?

Yes.  Yes they can and do continue to merge with minimal conflict
issues (though they have to hand manage the merge command because of
the lack of tracking forward on copies in Subversion) - tests with
this in bazaar show it working perfectly.  Though we're not doing the
diff+patch scenario, we're merging using the tool.

That we have such different workflows shows how the tools
appropriateness changes so mych from team to team.  Neither is the
silver-bullet for all development.

--
Talden