[MERGE/RFC] Userdoc Driven Design on the Bazaar 2.0 UI

Thu Apr 16 18:23:12 BST 2009

Matthew D. Fuller <fullermd <at> over-yonder.net> writes:

> Random rambling follows.  Recreational chemical dosage advised.

Randomly assembled response follows.  Same advice.

> So, for practical purposes, we've got 2 technical constraints here;
> the branch needs to be able to find its repository, and the repository
> needs to be able to find the branches using it.
> 
> An important non-technical constraint of any scheme is that it be
> reasonably guessable by an informed user.  Even better by an
> uninformed user, but either way, it shouldn't require a lot of
> special-case thinking, or archaeology of the details of a specific
> case.  I add that I DO consider our current setup to pass this.  The
> rules of which pieces require and use which others really are pretty
> simple, and (ignoring stacking) don't have any big list of exceptions
> or special cases.  We don't seem to always do that great a job
> _conveying_ the rules to people, though.  Or people don't seem to do
> that great a job apprehending them, one of the two.
> 
> One way is for the repository to be non-shared and the two to be
> colocated.  This is trivial, as well as unhelpful for the case of
> sharing, so ignorable.  Only shared cases matter here.  So, somehow we
> need to link the two.
> 
> The second is implicit linkage.
> 
>  We have that now via the mechanism of a hierarchical filesystem;
>  branches are 'below' repos, and repos are 'above' branches, so we can
>  know to walk one way or the other to find our missing component[s].
> 
>  Colocated branches do this in a similar hierarchical way, but with
>  differences; most specifically, that branches aren't dealt with as
>  filesystem entities, but as bzr concepts.  Of course, there's no
>  requirement that the root of the bundle-o-branches actually be
>  directly considered as a repository, but it doesn't change the
>  concept either way, so we might as well.
> 
>   This is, IMO, a highly desirable thing to have, but it really won't
>   work well unless, in its context, things are sufficiently well
>   switched to thinking about the internal branches as being _named_
>   rather than _located_.  This is a significant shift from the current
>   bzr worldview, and I don't believe it can be done by papering things
>   over, like switch's guessing "Oh, by 'switch X' you really meant
>   'switch $CURBRANCH/../X'".  I think it's reasonably doable by
>   treating branch refs as names in the context of a colorepo unless
>   they're disambiguated as locations.  Anyway.
> 
> What other implicit mechanisms can we imagine to go both ways?  I can
> think of a number of stupid ones, but no non-stupid ones have come to
> mind.
> 
> The third is explicit linkage.  This has necessary limitations; it
> requires that anything dealing with a branch has to be able to write
> into the repository.
> 
> This constraint is almost necessarily violated by stacking (or
> eventually, shallow cloning, which IMO is the much more interesting
> descendant of stacking) unless we declare that such things can't be
> done over read-only transports.  And that would be stupid.  So, this
> case may be something we just need to declare an exception to the
> 'repo able to find branch' rule and leave that to human manglement.
> 
>  One method of explicit linkage is by using layers underneath bzr.
>  The first obvious mechanism is ".bzr/repository/ is a symlink to the
>  real repo".  That covers branch -> repo.  For repo -> branch we could
>  have symlinks in $REPODIR/.bzr/repository/branches/.  Various other
>  sub-bzr mechanisms mostly reduce to variants of this.
> 
>  The other is inside bzr.  We already have a moral equivalent of this
>  in the branch reference in lightweight checkouts.  We'd have to have
>  a similar thing going the other way.
> 
> A downside of explicit linkage is that it ties both locations down; it
> breaks down when somebody moves either the branch or the repository
> around.  The implicit linkage handles those cases, as long as the
> relationship between the two remains unchanged.  With explicit, we'd
> need to come up with a "bzr bzrdir-mv" sort of command.  And that
> would break down too if we don't have write access to all the bzrdir's
> needed to walk the chain.

I think that we can clarify our understanding of repositories, branches and
trees in the process of this discussion.  My current working understanding is
that a branch is the fundamental object (a series of snapshots of a tree of
files/directories).  Then, a repository and a working tree are derivative
things; a tree expresses the state of a branch at a particular snapshot, a
repository stores the snapshots that make up a branch (or branches).

I think this makes clear (in agreement with you) that we have to be clear about
how branches and repositories (and possibly trees) are linked one with the
other.  The map from repositories to branches needs to be bidirectional so that
repositories can clean up un-needed data, while the map from working trees to
branches only needs to be uni-directional since branches do not need to be
concerned with the ontology of working trees.

> It's worth asking "what does $OTHER_VCS do?".  Depending on the choice
> of others, it may require playing a little looser with definitions of
> 'branch' and 'repo' than the quite well-delineated bzr concepts, but
> doing so, I don't think I can come up with a case that anybody else
> does anything BUT implicit.  In CVS/SVN and the ilk, branches are
> internal objects in the repository.  mtn, practically the same aside
> from the distributed capability.  Similarly (totally differently, but
> similarly ;) in git.  Even for arch.  Then there's the camp of
> branch-primacy VCS's.  darcs and hg, AFAIK, both have the 'repo' as
> purely internal to the 'branch', though with the terms muddied and
> modified by things like hg named branches, and with the caveat that
> the repo can sometimes be semi-shared via hardlink trickery.  Many of
> the above support shallow clones, but pretty much devolve into the
> obvious exceptions that you're "at your own risk" in such cases.

In terms of the above definitions, all of these systems have the above concepts,
but in many cases, they may or may not be separable.  CVS/SVN do not allow
repository/branch separation.  In some sense, SVN forces tree/repository
separation (which they call checkouts).  As you mentioned, git keeps
repositories and branches together, but also keeps working trees in the same
place.  (Although that is complicated by the terminological disagreement in the
term "branch".)  I think that the (vaunted and denigrated) flexibility of Bazaar
amounts to its ability to keep repositories, branches and working trees in
separate locations as separate objects.  Some of the simplicity (which implies a
small amount of usability) of other VCSs comes from keeping two or more of these
things together.

> A related concept is "where is the repo?".  It can be in 3 places;
> inside one branch, in a location unrelated to some/any of the
> branches, or in a specific location relative to all the branches.
> 
> The first case, IMO, loses right off, because the repository is
> implicitly 'owned' by that branch then (socially and psychologically,
> no matter how hard we might try to make it not technically).
> 
>   This is also slightly different from the case of hardlinking the
>   repo CONTENTS (e.g., *.pack) between independent repos in each
>   branch.  However, as long as we consider files
>   immutable-once-written, that case can be considered equivalent to
>   the case of branches having totally independent repos altogether.
>   It gains us momentary performance and temporary disk space, but it's
>   conceptually no model change at all, so ignored here.
> 
> The second case requires explicit linkage.  Well, there are two other
> options.  One is a hard-defined location for The Repo in any given
> environment.  "All your branches use ~/.bazaar/repo".  That sucks.
> The other is a rule-defined location.  "Branch /x/y/z uses
> ~/.bazaar/repo/x/y/z".  That sucks harder.  So explicit it is.  All
> the aforementioned problems are inherited, and the fact that nobody
> else does this suggests that it's may not be a fruitful or desirable
> road to explore.
> 
>   For an alternate perspective, however, we're also (I think) the only
>   branch-centric system that's evolved an explicit mechanism for
>   repository-sharing of full branches.  So maybe we SHOULD examine
>   this whole different set of rules.
> 
> So...   we could be different from everybody else by using explicit
> links.  I'm unconvinced that it's a good idea.  That leaves us with
> implicit linkage.
> 
> We currently do that by nesting the specific [branch] within the
> general/shared [repo].  I'm unable to think of a better way,
> maintaining a branch-centric focus, that doesn't fail hard at
> predictability.  This isn't meant to say that the UI for it all is
> perfect as-is, but the general structure I think is about as sound as
> it can be.
> 
> Losing the branch-centric focus, we can grow the colocated mechanism,
> where branches become attributes of a repository (or of some object
> not directly tied to the repository, but that would UI-wise be the
> same as what every other such system calls a repository, and contains
> one-and-only-one repository.  I don't think splitting that hair would
> help anything).

A number of people have argued persuasively
(http://cournape.wordpress.com/2008/10/30/going-away-from-bzr-toward-git/ for
one) that colocated branches are very, very important for power-users who
integrate other's work and who keep around many branches.  I think that the
kernel workflow is implicit in the fact that git has adopted this design.  Given
the flexibility of Bazaar for working with repositories, branches and trees, I
think that supporting colocated branches should be part of this re-thinking (we
need something to do for 3.0, right?).

If repositories, branches and trees make up the Bazaar universe, then we can
understand all of the proposals here in these terms.  All checkouts are
lightweight by default means: all trees are connected to remote branches by
default.  A branch could reuse a dependent branch's repository means: a new
branch should point to the existing repository by default.

I guess my main point is that how these three entities are connected determines
what type of version control system you are talking about and which operations
are easy.  My personal preference is for branches as directories inside a shared
repository (the current recommended working model) but I am *not* a power user
and find "cd ../../../../other_branch/bzrlib/tests/blackbox" a reasonable price
to pay for a simple mental model.

-Neil