Feedback from evaluation in a corporate environment

Fri Jan 8 04:15:39 GMT 2010

Important stuff first:

It seems to me that based on your requirements as stated you should
take a very close look at the bzr-svn stuff.  Subversion already has
some backend distribution features, IIUC, while bzr-svn would provide
the developers' UI, and the local branches features (including sharing
of local branches).

Quibbles follow.

Uri Moszkowicz writes:

 > I don't think you need to change your philosophy about the atomic view of a
 > repository, just that you shouldn't need to view the whole thing at once.
 > What's the problem with merging? If the files aren't checked out, then
 > there's no need to do any merging. Why does Bazaar even need to concern
 > itself with files that haven't been checked out?

Because somebody else may have checked them out and made conflicting
changes that spill over into your subset.  Because interfaces defined
and changed in the files you have checked out may be used in the files
you haven't checked out.

This is why the nested trees design *should* be inflexible.  If you do
not define relatively inflexible modules with "frozen" (or at least
"slushed") interfaces among them, then every developer needs a full
checkout in order to determine whether a change he is making can
affect others' work.

 > Remember, the SCM tool only provides the ability to describe
 > versions of content and you can certainly describe how only a
 > portion of it needs to change.

That's not just not "certain," in practice it's wrong in my experience
if the "portion" in question lacks a well-defined interface to the
rest of the project.

This is an example of what I mean by saying that I don't think your
development management systems could handle such flexibility even if
your VCS could.

 > Yes it is like another repo but not exactly - that's why I say Bazaar is
 > close but not quite there. The repositories don't have to be 100% in sync at
 > all times they just have to appear to be that way at the times that
 > operations are done. The repositories have to be coherent, just like caches
 > have to be coherent in a multi-core CPU.
 > 
 > Caches have it easy though because the cache line updates can't
 > fail.

This is true for DVCSes as well, at least to the extent that they
follow the append-only model.

 > With write proxies, you can run into deadlocks and livelocks that
 > result in repositories that can never be in sync. Imagine that two
 > users simultaneously commit changes to their local repositories.
 > Both make their commits locally and then try to update each other
 > in a way that results in a conflict.

Sure, this happens all the time in DVCSes.  It is inherent in
concurrent work, and it cannot be wished away, even with a magic DVCS
that presents a CVCS face to its users.  In the magic DVCS you will
have exactly the same situation, except that you enforce serialization
("first commit wins") on the developers; later developers cannot
version control their changes *at all* until they update and fix any
merge conflicts.

This means that the developer is committing changes in an environment
they weren't designed for.  This is bad news for understanding certain
common kinds of bugs.  The developer *thinks* he knows what's
happening, but in fact he's wrong.

 > When detected, both need to undo their commit to all repositories

No, they do *not*.  A new head will have been created, but it will not
have a name; it is not visible to the "branch" or "checkout" commands.
The first commit wins, here; it inherits the branch name.  This is
coherent (this state can be propagated to all repos automatically),
but in general it's not what you want to send to the customers.

Now the problem is to coordinate the merge of the two heads to restore
the semantic coherency of the tree in the head that "owns" the branch
name.  But this coordination cannot be done automatically in any case.
True, many such conflicts are semantically trivial (one programmer
changes a variable name, another updates a comment on the same line),
but many will require the programmers who generated the conflict to
confer to decide who is right, or perhaps that both are wrong and a
new approach should be tried.

 > > It is fairly easy to provide a plugin which then provides project
 > > specific plugins. The main reason we don't, is because in a distributed
 > > system, you have to be wary of untrusted sources.
 > >
 > > If I merge from $JOE_RANDOM, I don't want to be running his untrusted
 > > plugin code.
 > >
 > > In a corporate environ, you are much less likely to merge from $JOE_RANDOM.

That's right, and Mercurial has implemented a trust mechanism such
that in a repo accessed by ssh (presumably any repo where committer
user ids on the *files* -- not the VCS committer -- can be trusted),
if the repo's .hgrc's owner is not on your trusted list, it won't be
run.

 > I understand the security concern but if you're downloading software from
 > $JOE_RANDOM, aren't you already exposing yourself to a security risk through
 > the content itself?

Yes, but *that* risk is much more easily controlled.  The risk in an
automatically executed task is that evil code will erase traces of
itself once it's done its dirty work.  There's the famous
(apocryphal?) story of the "login" virus in AT&T's C compiler.

 > There's just an application for DVCS that you haven't considered:
 > to geographically distribute a centralized repository.

This is probably a bad idea; current DVCSes are optimized in a
different direction (although git's object store, and possibly
Mercurial's or Bazaar's, could be bent in this direction -- Bazaar
would be harder than git, I think).  Anyway, the Bazaar developers
certainly have considered it, and implemented a simple partial version
with a "directed star" architecture (bound branches).

 > CVCS tools work well with small cohesive teams. Such teams tend to avoid
 > outsourcing because they can no longer operate this way with the
 > geographical divide and the cost can outweigh the gains. A DCVCS as I've
 > proposed solves this problem.

I don't think so.  True, some of the costs are due to (unnecessary)
manual coordination among repos in the current DVCS models.  But the
inter-developer coordination problem remains, and cannot be solved by
a magic DVCS (or "DCVCS" if you prefer).

 > A DVCS is only coherent if it always update to the latest revision before
 > performing a checkout/branch.

That's a misuse of the word "coherent".  In DVCS terminology,
"coherent" may refer to *any* set of revisions, not just to the whole
project.  The whole point of DVCS (to date) has been to find an
appropriate level for the "coherency" guarantee, and it seems to have
settled at the repo level: a repo is coherent if it is *internally*
consistent.  There are no implied guarantees across repos except that
merging their DAGs will be possible and efficient.  This implied
guarantee is implemented via coherency and UUIDs for revisions.

Please choose another word (I proposed "synchronized" elsewhere,
"centralized" might be a better term given your semantics, although
that's pretty awkward English).

 > My goal, here, was to describe a workflow which, I'll admit has
 > some tradeoffs and is not appropriate for everyone, Bazaar does not
 > currently support but which others would likely be interested in. I
 > would describe it as "Decentralized with distributed shared main
 > line".

Again, I would object to that description.  AIUI, your specification
requires that the distribution of the shared main line should be
*invisible* to the users (and the admins except for bugs in the VCS).
I would describe it as "shared main line with private local branches
and distributed backend database."  (The part about distribution in
fact isn't part of the workflow spec.  It's an optimization of the
implementation that should be invisible to users, but is important
enough to advertise.)