help getting a clue about tracking changes in an integrated library

Thu Sep 23 05:02:57 BST 2010

Chris Hecker writes:

 > With a good VCS, nothing should scare you (except a disk failure).
 > > Bazaar is a good VCS. :-)

 > Scary is multidimensional; data loss is not the only axis.  Another
 > axis is spending an infinite amount of time on something.

Of course, but a good VCS means you can fool around with something for
the time you can spare, and if the relationship doesn't grow serious
within that period, you hit the rewind button and you're back where
you started in O(1) time.

 > I mean, simply copying the source into the trunk has a well
 > understood set of disadvantages and is bounded in terms of my time
 > invested.  That's getting more appealing as this thread grows.  :)

That's up to your judgment.  There are two mitigating factors here.

The *user*, not an admin, owns and manages *each* repository.  On the
one hand that's additional responsibility, but it's mostly in the
initialization phase.  On the other, that means that experience is
widespread, and you get *excellent* support from other users.

The second is that, because your branch is a full repository and
Bazaar is flexible, you can edit out "junk" when you publish your
branches (at additional effort in many cases, granted).  Unlike a
centralized system, a commit need never be a public, irrevocable
mistake.  This (along with the ability to instantly revert to a known
state) means that experimentation is cheap.

 > > Monolithic: reverting to a version of your code plus the
 > > corresponding version of "lib" is easy, because they share
 > > history.

 > Yeah, this seems like it's a requirement always, no?  I mean, if
 > you can't sync to a version and get something that builds and is
 > correct for that point in time, then it seems like you're not
 > really getting the advantages of using a vcs.  I guess different
 > people have different priorities, but that seems fundamental to me.

Well, the "just copy the code in" model gives you replicability.  But
it's inflexible, and doesn't help you to link history of your modules
to that of externally maintained modules.

Unfortunately, generalizing to true "nested branches" is hard.
Looking around, released versions of bzr don't even try yet.  git's
submodules are an elegant hack, better than nothing, but not good
enough -- and extending to "good enough" will require inspiration,
it's not a SMOP in the git model.  I don't know much about Mercurial's
forests, except that nobody I know actually uses them, so I have to
wonder if they're very useful :-/ .  Complaints about svn "externals"
are legion.

The bzr guys will do a good job, I'm sure, but I doubt the dust will
settle on this area for several years.

 > It definitely sounds like "nested trees" are what I want from
 > reading the design page.  Is that work underway?

FWIW, my impression (this is *impression* only to give you a
(mis?)conception to start from) is that people are working on it
unofficially, but IIRC within the bzr developer group there were two
significantly different designs by senior developers, neither of whom
is employed to work on bzr any more, and the actual work currently
being done is a somewhat ad hoc design by a third party.

That said, IIUC nested branches are #2 on the list of coming big
features, and there are plenty of senior developers capable and
interested in developing this, both Canonical-employed and volunteer.
Based on past experience, once it's agreed to be top priority, the
late alphas ("may kick your wife and elope with your dog") will be
available in about 3 months from the go-sign, the betas shortly
thereafter and probably the beta period for something this big will be
about 3 months.

The question is better asked in a subthread of the "Roadmap" thread,
and Martin Pool, who is active in that thread, is authoritative on
where Canonical resources will be allocated.

 > Here's another question: let's assume I'm willing to give up having
 > the code in different branches, is there any way to use bzr to help
 > with the merges between unrelated branches?

Not really.  The driving idea in VCS-based merging is "don't apply the
same patch twice".  This requires keeping track of the patches.  The
internal details of bookkeeping are varied: Darcs tracks patches
explicitly, while in the DAG-based systems (bzr, git, hg) a patch is
implicit in the difference between two versions.  But even Darcs
decides whether two patches are the same by comparing "canonical
names", so common history is the sine qua non of VCS-based merging in
practice.

 > In other words, assume I'm willing to just copy the "final" code
 > into my trunk, what's the best way to reduce my manual merging?  It
 > sounds like maybe just keeping a separate repo for this code, with
 > lib-orig, lib-mod, lib-mine, and then copy lib-mine into its place
 > in my tree (and vice versa).

This is exactly the "poor man's nested branch" model.  Since you've
invented it for yourself, it's the obvious way for you to go.

A couple of refinements.  First, don't forget to "bzrignore lib".

Second, no need for a separate shared repo.  The repo is basically an
object database.  You can store completely unrelated projects there,
and bzr won't care.  It's efficient: when you export a branch from the
repo, only the needed objects are copied into the target branch.
AFAIK, there's no need for branches to be children of the repo, they
can be any level of descendents, so you can use svn-like directory
structuring for the different components of your project:

    shared-repo
        external-lib-1                        # not a branch
            upstream-release-branch
            local-mod-branch
        external-lib-2                        # not a branch
            upstream-release-branch
            3rd-party-mod-branch
            local-mod-branch
        my-project                            # not a branch
            version-1                         # not a branch
                release-1.0-branch
                release-1.1-branch
            trunk-branch
            not-ready-for-prime-time-branch

(Of course this means that the branch URLs get a little longer.)  The
only technical reason for a separate repository is if the repository
is (or might be) public, and there's code you can't/don't want to
redistribute.  If you just prefer a separate repository, I don't think
there's any big loss there, though.

Third, the right way to "copy lib-mine" is probably "bzr checkout
--lightweight".  This is very flexible -- that workspace can be the
single workspace for *all* of your work on "lib": you can "bzr switch"
that workspace to checkouts of "lib-orig" or "lib-mod" to integrate
upstream changes in the appropriate branch, then merge those changes
to "downstream" branches.

 > That way I have a copy in the main branch, which is vital for
 > reverting, but also bzr can help with the updates?  The downside
 > there is bzr has no knowledge that lib-mine and the lib in the
 > trunk are related, I guess?

That's right.  As you noticed above, the problem here is that the
version of "lib" integrated into your builds is not tracked by Bazaar.
Eventually this will be solved by true nested branches, but in the
meantime you can use kludges like a build rule that uses "bzr id -r"
to report the revision ID for each of your external libraries, and
stores that in a version.py file or a build summary log.