Figuring out a workflow (bzr-svn, local changes I never want to push upstream)

Sat Aug 16 16:33:09 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Russel Winder wrote:
> Tom,
> 
> First and foremost, I don't think you are doing anything insane.  I have
> also found this problem.
> 
> On Fri, 2008-08-15 at 16:16 -0500, Tom Tobin wrote:
> [ . . . ]
>> cd shared_repo/trunk (this is the bzr-svn branch)
>> bzr pull
>> cd ../local (this is a branch off of trunk with an extra commit for
>> the aforementioned local changes)
>> bzr rebase
>> cd ../feature_branch (this is a branch off of local)
>> bzr rebase
>> [hack hack hack]
>> bzr ci -m "hack hack hack"
>> cd ../trunk
>> bzr merge -r -2..-1 ../feature_branch
>> bzr dpush
>> cd ../local
>> bzr rebase
>> cd ../feature_branch
>> bzr rebase (now I get spurious history, with both the upstream svn
>> commit and the bzr commit for "hack hack hack", as well as the
>> local-change commit showing up twice)
> 
> The problem here is that the dpush causes a renaming of the "changeset"
> so different branches now have different names for the same "changeset"
> -- I know Bazaar doesn't have a notion of changeset but how else to
> label the set of changes between one revision and another?  Thus merging
> two branches with the same changeset leads to it appearing twice.  So
> effectively you cannot branch sanely from a branch that is being used as
> a dpush mirror of a Subversion repository.

So, I'm trying to make sure I understand what his specific goal is, to help
decide what the correct fix is.
He has 3 branches

trunk => direct mirror of SVN branch, sync'd with svn via pull and dpush
local => a local branch that has some amount of customization that isn't
         appropriate for trunk
feature => a branch for feature hacking. Current based on local

I'm a bit unclear why you would have feature off of local, unless that is the
only way to get it to work on your machine. As an aside, this feels like a
fragile way to work, no matter what VCS you are using. My quick guess is that
you are versioning a config file, rather than versioning a template for that
config file. How would you do it if you didn't have bzr? Just have a locally
modified file that you never commit? Anyway, to continue...

You periodically bring changes from trunk (bzr pull), and then you propagate
them through your other branches. It seems you are using 'bzr rebase' for
this? (replay my local commit on top of trunk, and ignore the old revision.)
You then rebase your feature branch on top of your new local branch, and then
cherry pick the tip back into trunk.

I'm a bit curious about how the feature branch works, and if you are always
cherry-picking the last commit, or multiple commits, or what. If you are just
cherry-picking, why not just use merge rather than rebase. So the workflow
becomes:

cd trunk
bzr pull
cd ../local
bzr merge ../trunk
bzr commit -m "bring in latest trunk"
cd ../feature
bzr merge  # you could merge trunk or local, as you have enough ancestry
bzr commit -m "bring in latest trunk"
# hack hack
bzr commit -m "new feature"
cd ../trunk
bzr pull # you have to make sure you are in sync
bzr merge -r -2..-1 ../feature
bzr commit
bzr dpush
bzr pull # I don't know if dpush does pull for you

locally you just use cherry picking back and forth between the 'trunk' branch
and the feature branch. I don't really see why you need to go through the
'local' branch at that point.

> 
> I claim (but this is from memory of experiments not from recent data)
> that Git doesn't have this problem.  I believe I can have a git-svn
> clone of a Subversion repository and then clone that and "svn rebase/svn
> push" from the Subversion clone and pull/push from the secondary clone
> -- very much in the way you are working above -- and it all works.
> Jelmer tells me that the Git rebasing renames the changesets just as
> Bazaar dpush does.  It seems though that Git's use of repository as the
> cloned thing rather than a branch as the branched thing means that you
> don't get the duplication that you do with Bazaar when you merge.

So, there are a few possibilities.

1) git considers the 'tree' hash as well as the 'commit' hash when figuring
out ancestry. I don't strictly think it does this, but there are at least some
possibilities here.

2) 'git dpush' rebases all branches in your local repository that are based on
the revisions you are rebasing. So if you have 'local' and 'feature' that are
based on top of 'trunk', and you 'dpush' trunk (causing it to create new
revisions for the 'trunk' branch, it also goes around and rebases all the
other branches that were based on that tip.)

One way to tell the difference, is to actually create a separate *clone* so
you have a separate repository. And see if the workflow stays smooth when
'feature' is not local.

'bzr dpush' could also rebase all appropriate branches in the repository. It
is a bit more work to find them, because we don't collect them side-by-side. I
also would guess that git repos don't tend to accumulate as many branches. I
don't know for sure, but I know my personal bzr repo has 200+ feature branches
in it. And certainly doing "git branch" to list all the branches isn't very
useful when it is multiple-screens long.

Anyway, if you are only having a few branches, it certainly wouldn't be hard
to have 'bzr dpush' check for branches in the repository which are based on
the current revisions, and rebase all of them for you. (not that I know the
internals, I'm just thinking that conceptually it is doing 1, it should be
able to do 3.) If 'dpush' is just "renaming" the revisions rather than
actually changing any content, then you don't have to worry about getting
conflicts in one of your side branches.

> 
> Of course the experiments need to be done, to ensure I am not just
> mis-remembering things.
> 
>> I *have* to be doing something utterly wrong/insane; can anyone point
>> it out to me?  :-)
> 
> Hopefully, the above is true, accurate and even correct :-)
> 
> I consider the problem you highlight above to be more or less a blocker
> to use of Bazaar as a Subversion client where the Subversion repository
> is the master copy.  Where there is only ever one branch used there is
> no problem -- I know I am doing this every day.  However this is such a
> sever restriction of workflow, especially in the general Bazaar context,
> that it should be treated as a blocker.

I know at one point, Jelmer tried to have 'bzr commit' in a bzr branch bound
to an SVN repository do the commit on the SVN repository first, and then
effectively do a 'pull' into the local repository. (For normal bzr branches,
you do the commit locally, and then wait to update your local branch tip until
it successfully transfers to the master repository.)

I wonder if this would solve some of the issues as well. You still have some
issues about a dangling merge revision_id.

> 
> NB  If you store the Bazaar branch in the Subversion repository, i.e.
> don't use dpush, use push, then none of the problems happen, arbitrary
> branching is possible.  The downside of this is that if the Subversion
> repository uses svnlook or certain other commit mail hooks, then you get
> increasing and arbitrarily large commit emails -- but this has been the
> topic of previous exchanges on this list.

I also don't specifically know why bzr-svn has to create an ever-increasing
list on the remote side. I believe Jelmer does it for performance. The svn
protocol doesn't make it easy to ask "give me all values for this node over
history", rather you have to keep probing it repeatedly. So instead, Jelmer
decided to make the "current" value be the full list over history. And it
seems that 'svnlook' doesn't do deltas for properties, and thus emits the
whole property for every commit.

> 
> PS  It is worth remembering that  storing a Bazaar branch in a
> Subversion repository is the primary purpose of bzr-svn, dpush was only
> added as an extra because I moaned long enough and loud enough.  Clearly
> more work needs doing to allow dpush to be part of the workflow you
> highlight above, which I think will be an important transitioning
> workflow.  I think though, it means people need to volunteer to do more
> than just raise the point -- something I have not been able to do to
> date.  Based on the sort of workflow above, the issues of how to not
> create duplicate "changesets" needs to be worked out so that necessary
> changes to bzr-svn can be planned.
>  
> 

It would be helpful (at least for me) to have a more concrete use-case
defined. Of what changes are actually where, and what workflow you would like
to see when you are done. And how tolerant you are to 'cheating'.

For example, one of the really nice things about the current bzr-svn
integration, is that you can have *multiple* people all using bzr
synchronizing through the svn repository. If they have the extra bzr
revisions, everything "lines up". (And 'multiple people' can also be a single
developer working in multiple locations, eg desktop, laptop, server.)

One of the problems with 'dpush' (I would imagine) is in a multi-site
scenario. Say I have a feature branch on my laptop, and one on my desktop, how
do you coordinate the 'rebasing'.

That said, if you aren't trying to scale to those sorts of problems, there are
lots of tricks you can do locally. Like having a combined bzr *and* svn
checkout. So you would have 2 'branches' co-located in 'trunk'. One would be a
bzr branch that follows the svn branch, but also tracks all of your local bzr
branches.

I've certainly done something like that without 'bzr-svn'. I've done it for
CVS at the time. The workflow ends up looking like:

cd trunk
cvs update
bzr commit -m "bring in latest cvs changes"
cd ..
bzr branch trunk feature
cd feature
# hack hack, commit, hack, commit, etc.
cd ../trunk
cvs update && bzr commit # in case there are cvs changes
bzr merge ../feature
cvs commit -m "merge new feature"
bzr commit -m "bring in latest feature"

So basically you have a bzr branch which stays "in sync" with 'upstream', but
really has its own ancestry.

This is a *much* simpler thing to implement that what 'bzr-svn' does. You can
even have helper functionality for brining in multiple svn commits as bzr
commits. Effectively they would be "rebased" from svn => your bzr branch.

The main downside of this is that you only have 1 master bzr branch. Everyone
who wants to work on bzr needs to be using *your* bzr trunk branch. (As
opposed to bzr-svn where everyone who pulls from svn => bzr gets identical
copies every time.)

But if you are a "lone-wolf" there are much easier ways to get what you want,
and not lose any of bzr flexibility.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIpvM1JdeBCYSNAAMRAoauAKCDb3CQ475mSq11Cj5fgCYG5mhkUACfe3ZH
UdkB2+HUx3N+j8pxhALrYcU=
=5ppA
-----END PGP SIGNATURE-----