[MERGE/RFC] Userdoc Driven Design on the Bazaar 2.0 UI

Thu Apr 16 01:55:49 BST 2009

On Thu, Apr 16, 2009 at 09:40:46AM +1000 I heard the voice of
Robert Collins, and lo! it spake thus:
> 
> Now, as to how we do that [...]

Random rambling follows.  Recreational chemical dosage advised.

So, for practical purposes, we've got 2 technical constraints here;
the branch needs to be able to find its repository, and the repository
needs to be able to find the branches using it.

An important non-technical constraint of any scheme is that it be
reasonably guessable by an informed user.  Even better by an
uninformed user, but either way, it shouldn't require a lot of
special-case thinking, or archaeology of the details of a specific
case.  I add that I DO consider our current setup to pass this.  The
rules of which pieces require and use which others really are pretty
simple, and (ignoring stacking) don't have any big list of exceptions
or special cases.  We don't seem to always do that great a job
_conveying_ the rules to people, though.  Or people don't seem to do
that great a job apprehending them, one of the two.

One way is for the repository to be non-shared and the two to be
colocated.  This is trivial, as well as unhelpful for the case of
sharing, so ignorable.  Only shared cases matter here.  So, somehow we
need to link the two.

The second is implicit linkage.

 We have that now via the mechanism of a hierarchical filesystem;
 branches are 'below' repos, and repos are 'above' branches, so we can
 know to walk one way or the other to find our missing component[s].

 Colocated branches do this in a similar hierarchical way, but with
 differences; most specifically, that branches aren't dealt with as
 filesystem entities, but as bzr concepts.  Of course, there's no
 requirement that the root of the bundle-o-branches actually be
 directly considered as a repository, but it doesn't change the
 concept either way, so we might as well.

  This is, IMO, a highly desirable thing to have, but it really won't
  work well unless, in its context, things are sufficiently well
  switched to thinking about the internal branches as being _named_
  rather than _located_.  This is a significant shift from the current
  bzr worldview, and I don't believe it can be done by papering things
  over, like switch's guessing "Oh, by 'switch X' you really meant
  'switch $CURBRANCH/../X'".  I think it's reasonably doable by
  treating branch refs as names in the context of a colorepo unless
  they're disambiguated as locations.  Anyway.

What other implicit mechanisms can we imagine to go both ways?  I can
think of a number of stupid ones, but no non-stupid ones have come to
mind.

The third is explicit linkage.  This has necessary limitations; it
requires that anything dealing with a branch has to be able to write
into the repository.

This constraint is almost necessarily violated by stacking (or
eventually, shallow cloning, which IMO is the much more interesting
descendant of stacking) unless we declare that such things can't be
done over read-only transports.  And that would be stupid.  So, this
case may be something we just need to declare an exception to the
'repo able to find branch' rule and leave that to human manglement.

 One method of explicit linkage is by using layers underneath bzr.
 The first obvious mechanism is ".bzr/repository/ is a symlink to the
 real repo".  That covers branch -> repo.  For repo -> branch we could
 have symlinks in $REPODIR/.bzr/repository/branches/.  Various other
 sub-bzr mechanisms mostly reduce to variants of this.

 The other is inside bzr.  We already have a moral equivalent of this
 in the branch reference in lightweight checkouts.  We'd have to have
 a similar thing going the other way.

A downside of explicit linkage is that it ties both locations down; it
breaks down when somebody moves either the branch or the repository
around.  The implicit linkage handles those cases, as long as the
relationship between the two remains unchanged.  With explicit, we'd
need to come up with a "bzr bzrdir-mv" sort of command.  And that
would break down too if we don't have write access to all the bzrdir's
needed to walk the chain.

It's worth asking "what does $OTHER_VCS do?".  Depending on the choice
of others, it may require playing a little looser with definitions of
'branch' and 'repo' than the quite well-delineated bzr concepts, but
doing so, I don't think I can come up with a case that anybody else
does anything BUT implicit.  In CVS/SVN and the ilk, branches are
internal objects in the repository.  mtn, practically the same aside
from the distributed capability.  Similarly (totally differently, but
similarly ;) in git.  Even for arch.  Then there's the camp of
branch-primacy VCS's.  darcs and hg, AFAIK, both have the 'repo' as
purely internal to the 'branch', though with the terms muddied and
modified by things like hg named branches, and with the caveat that
the repo can sometimes be semi-shared via hardlink trickery.  Many of
the above support shallow clones, but pretty much devolve into the
obvious exceptions that you're "at your own risk" in such cases.

A related concept is "where is the repo?".  It can be in 3 places;
inside one branch, in a location unrelated to some/any of the
branches, or in a specific location relative to all the branches.

The first case, IMO, loses right off, because the repository is
implicitly 'owned' by that branch then (socially and psychologically,
no matter how hard we might try to make it not technically).

  This is also slightly different from the case of hardlinking the
  repo CONTENTS (e.g., *.pack) between independent repos in each
  branch.  However, as long as we consider files
  immutable-once-written, that case can be considered equivalent to
  the case of branches having totally independent repos altogether.
  It gains us momentary performance and temporary disk space, but it's
  conceptually no model change at all, so ignored here.

The second case requires explicit linkage.  Well, there are two other
options.  One is a hard-defined location for The Repo in any given
environment.  "All your branches use ~/.bazaar/repo".  That sucks.
The other is a rule-defined location.  "Branch /x/y/z uses
~/.bazaar/repo/x/y/z".  That sucks harder.  So explicit it is.  All
the aforementioned problems are inherited, and the fact that nobody
else does this suggests that it's may not be a fruitful or desirable
road to explore.

  For an alternate perspective, however, we're also (I think) the only
  branch-centric system that's evolved an explicit mechanism for
  repository-sharing of full branches.  So maybe we SHOULD examine
  this whole different set of rules.

So...   we could be different from everybody else by using explicit
links.  I'm unconvinced that it's a good idea.  That leaves us with
implicit linkage.

We currently do that by nesting the specific [branch] within the
general/shared [repo].  I'm unable to think of a better way,
maintaining a branch-centric focus, that doesn't fail hard at
predictability.  This isn't meant to say that the UI for it all is
perfect as-is, but the general structure I think is about as sound as
it can be.

Losing the branch-centric focus, we can grow the colocated mechanism,
where branches become attributes of a repository (or of some object
not directly tied to the repository, but that would UI-wise be the
same as what every other such system calls a repository, and contains
one-and-only-one repository.  I don't think splitting that hair would
help anything).

What's the third option?

-- 
Matthew Fuller     (MF4839)   |  fullermd at over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.