Repository referencing in command lines (was Re: 08:16 < abentley> Better phrasing: [...])

Fri Feb 10 16:11:57 GMT 2006

On Fri, Feb 10, 2006 at 09:04:59AM -0600 I heard the voice of
John A Meinel, and lo! it spake thus:
> 
> $REPO/
>   bzr/ <- this is the release branch
>    bugfixes/
>    dev/
>      mbp/ <- Martin's development branch
>      jam/ <- My integration branch

To be sure I understand, let me clarify myself on 3 properties of
this:

1) "bzr" in this sense is roughly equivalent to CVS's "modules".  (but
   see below)

2) All the sublevels are "equal" branches.  That is, "bugfixes/foo"
   and "dev/bar" are treated just the same as if they were named
   "bugfixes.foo" and "dev.bar".  The naming just provides a
   conceptual hierarchy, it doesn't make 'higher' or 'lower'-level
   branches, and "dev" is by itself not a branch; it's just a part of
   a label.  (this isn't a very clear explanation of what I mean)

I can go with those.

3) This is actually reflected in the physical layout of the
   repository.  i.e., you'd have
   $REPO/
     .bzr/  (actual revision store)
     bzr/
       dev/  (branch-specific data for the "dev" branch of the "bzr"
              module, referencing the revisions in the top-level .bzr)

That I hadn't considered quite so much.  Reflexively, I draw back from
it a little bit, but I can't come up with a good reason why.  And it's
sensible.  So I'll go with that.

[On Modules, from above]

The alternative is not having a 'module' concept at all, that
"$REPO/foo/bar" and "$REPO/baz/quuz" refer to branches in completely
the same namespace.  I don't like that quite so much; if all the
branches in a repository are "siblings", I'd personally create a
separate repository for each 'project'.  If I can divide it into
modules, each of which defines a namespace of branches within it, I
can put a bunch of projects in one repo.

I don't s'pose it much _matters_ either way; I'd just tend to work
with it differently depending.

> Then to access a branch you have simply:
> 
> bzr get $REPO/bzr/dev/jam/
> 
> rather than
> bzr get $REPO/bzr/ --branch=dev-jam

Roughly, I dislike this because it's harder to see by eye what the
branch name is.  I mean, if I say
"sftp://foo@bar/depot/repo/nix/run/shazaam", am I referring to a
branch called "shazaam", or one called "run/shazaam", or...  I like it
being obvious what's what.

In the full syntax in my other mail, I'd write that as

% bzr get repo:$REPO/bzr:dev/jam
(or maybe repo:$REPO:bzr/dev/jam, depending on how we handle questions
of modules, and how much of that path actually is the repository).

> Right now, a URL (semi-)uniquely defines a branch. It is a nice
> property, that would be nice to keep with repositories.

It does have advantages.

> The biggest problem with $BZR_REPO is the difficulty in saying "bzr
> get path/foo" are you saying that you want the local directory
> path/foo, or are you saying you want $BZR_REPO/path/foo.

Well, that's easy; if you wanted the repo, you'd specify something
like "bzr get repo::branch.name".  The theory is that if you somehow
need a repo (which you wouldn't, unless you specified repo:[...]), the
env variable would be used as the default.

I'm not quite wedded to that syntax, of course; the
"repo::branch.name" form, omitting the repo path in the middle, IS a
little ugly.

> We have discussed it, because it would be nice to have shorter paths
> for things. I think we came to the idea that aliases would be a
> better solution, since it handles more cases anyway.

Aliases are certainly more flexible.  I shy away a little from using
markers like ^ for them, since that breaks if you define paths with
foolish characters in them; that's why I had a separate tag repoalias:
(or whatever it was; I already forget ;) in my mention of them.  But
yes; no matter what other shortcuts or tricks we get, I very much want
to be able to use aliases in co and branch and merge and push and
whatever else.

The primary use for the $CVSROOT variable in CVS is really for cases
where, on one machine, you pretty much ALWAYS use a single repository.
In the bzr case, that would make doing a 'bzr co' easy, since you
don't have to specify the repo.  However, it doesn't gain you near as
much on 'branch' and the like, since we have standalones.  So perhaps
it's not quite as useful.

Still, it's handy for wrapping.  Aliases give you a lot more
flexibility, but they require changing in the middle of the line,
which is mentally harder for me than changing at the beginning.  I
have the script wrappers partly because it saves typing long paths,
but also because it's easier for me to switch from

    % foocvs co
  to
    % barcvs co

  than it would be to go from

    % cvs -d^foo co
  to
    % cvs -d^bar co

Not because of the length; just the location within the line.  It's
easier for me mentally to change the beginning or end, than the middle
(and it's easier from the shell, with ^A/^E, too).  Maybe I'm just
weird, and that doesn't matter to anybody else, though.

"branch" is a good example of why I like the env variable over the
alias.  It takes basically two branch descriptions; one to branch
from, the other to branch to.  And either (or both, or neither)
_could_ refer to a repo branch.  If I have a wrapper that sets an env
variable for the repository in question, I can do "abcbzr branch foo
repo::" or "abcbzr branch repo::foo", and it'll pull from/to the
"right" repository.  If that had to be done via aliases, it would mean
the script trying to parse out my command line to figure out where to
insert the alias, which is much harder and more error-prone.

> > SVN does this by the path names.  I really, really hate that, as
> > far as it's reflected in the actual structure of the repository.
> > Think of the effects on revnos and `bzr log`.  As a UI
> > abstraction, I just single-really hate it.  So, I'd out that.
> 
> I would like to hear more about your thoughts. Since this was what
> we had planned. And I don't fully understand your comments, to be
> able to decide whether I agree or not.

Well, as above, I _really_ hate how SVN jams all that stuff into the
file layout namespace (see para below).  That doesn't really affect bzr
though.

I guess mostly, I dislike the inability to see exactly what the name
of the branch is in the URL.  It feels like "what branch this is" is a
separate piece of information from "where the branch is", and jamming
them together does a big thud for me.

[ On 'file layout']  I'm having trouble coming up with a good way to
phrase exactly what I mean by that.  The closest I can get is "the
layout of files in the project", vs. "the layout of files in the
repository".  In SVN, the former includes all these directories that
are meant to describe branches, even though you probably wouldn't
often checkout from the 'top level', but rather from deeper in the
tree.  That's a pretty murky explanation too, isn't it.  Blah.  But
it's perfectly clear in my head what I mean, so if you'll all just
tune your brains to 378.4GHz...

> Meaning, how do you determine what your current state is, and what
> is your use case? If we allowed you to create a glob expression,
> that would set your default repository based on path, would that
> give you what you want? (So stuff in ~/dev/work would use a
> different repository than ~/dev/home).

Well, we're off in the woods a bit here.  That by-dir configurability,
the aliases, and env variables, _can_ pretty much all solve the same
problems.  I think which is an easy/natural way differs according to
the case (and that everybody wouldn't agree on which method was best
for which case).

For me, the env makes a lot of scripting easy, and lets my fingers
stay trained across projects where I need differences (which isn't
always dependent on my path, so the globbing wouldn't entirely cover
it).  The aliasing would be of immense value for projects where I'm
merging from or pushing to several different places.  Setting path
globs as above would be really handy for some relatively "fixed" stuff
(all my "lel" work is in "~/work/lel", so any usage of cvs/bzr in
there should point at the lel repository).

> I think we can do this in a way that people don't have to write 20
> different shell aliases to get it to work how they want. Though
> maybe we can try and support the shell aliases, since obviously
> someone like you is comfortable with that method.

Well, the usages which lead me to doing it are certainly rather
pathological    8-}

Path-matching would eliminate a lot of them, and aliases would get
most of the rest.  Aliases _could_ probably knock out all the rest,
though I think the env variables would sometimes be an easier or more
natural way to do it.

-- 
Matthew Fuller     (MF4839)   |  fullermd at over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.