help getting a clue about tracking changes in an integrated library

Wed Sep 22 03:35:52 BST 2010

Chris Hecker writes:

 > - lib-1.0 is the released version on the internets.
 > - my friend has modified lib-1.0, so it's kind of a branch of
 >   lib-1.0

It *is* a branch, by definition, once you have both in the VCS.  Maybe
you mean "fork" (which is a "socially-defined" term, meaning a branch
of indefinite expected lifespan maintained outside of the parent
project)?

 > - when lib-1.1 comes out, I'd like to be able to get those fixes

So far, no technical problem.

 > - I want lib to be checked into my main source tree, not some other
 >   directory I have to sync

I assume that by "integrated" you mean this kind of structure:

    top -+- doc
         |
         +- src
         |
         +- lib <== the code maintained outside your project

This is going to be a dance in bzr, but closer to "Flashdance" than a
waltz.  (What you need is "nested branches", and bzr doesn't do that
yet, although it's a frequently requested feature.  See the "Roadmap
for Bazaar..." thread.)

There are two basic approaches available at present, "poor man's
nested branches," and "monolithic branches".  The poor man's nested
branch works like this:

1.  Initialize a shared repository.[1]

2.  Set up a "trunk" branch of your project containing everything but
    the lib subtree.  (How to do this will depend on how much history
    of your project needs to be preserved.)

3.  In that branch, "bzrignore lib".

4.  As a *sibling* of your branch, create a lib-upstream branch:
    copy the lib-1.0 code there, bzr init, bzr add, bzr commit, bzr
    tag.[2]

5.  As a *sibling* of lib-upstream, branch from lib-upstream into
    lib-modified, apply your friend's patches/copy the code into
    lib-modified, bzr add/rm/mv as appropriate, bzr commit, bzr
    tag.[2]

6.  cd into "trunk", and "bzr branch ../lib-modified lib".
    *Not* "bzr checkout ...": using an independent branch means you
    can hack in ./lib without worrying about "polluting" your
    upstreams.

This is probably more intuitive, but the relationships among the
branches are non-trivial, and depending on your work habits, it may be
easy to forget to commit your local changes in trunk/lib.  ("cd trunk;
bzr status" won't remind you about them, for example.)  If you're
never going to make such changes, then it's no problem. :-)

Ongoing workflows:

a.  Normal development: work in trunk as usual: hack, commit, release.
    For experimental work create feature branches, for release
    maintenance create maintenance branches.

b.  Friend releases bugfixes etc: apply to "lib-modified" branch, then
    merge to the "lib" branch in trunk, then merge to any descendents
    of trunk.  *Your friend must not be including any upgrades to
    upstream in these patches.*[3]

c.  Upstream releases new versions: apply to "lib-upstream" branch,
    then merge down the cascade via "lib-modified", "trunk", and any
    feature or maintenance branches.

It is possible to do all the work in the same workspace.[4]

For the "monolithic" approach, you need to set up two branches, and
maybe three, I think, one which is used only for syncing to upstream
"lib", one which is used for syncing to your friend's version, and one
for your own work.  If you're starting from scratch (no history in
your project), then I would

1.  Initialize a shared repository.[1]
2.  Initialize a branch "lib-upstream" in the shared repository.
3.  Untar lib-1.0 in lib-upstream.
4.  Rename the resulting subdirectory as desired.
5.  Add all the files to bzr.
6.  Commit.
7.  Tag the commit.[2]
8.  Branch "lib-modified" from "lib-upstream" in the shared repository.
9.  rm -rf the upstream version of the lib directory.  (I'm assuming a
    tarball for your friend's version.  If not, modify steps 9 and 10
    appropriately.)
10. Untar lib-1.0-modified.
11. Rename to the name used in 4.
12. "bzr add" any new files, "bzr rm" any deleted files, "bzr mv" any
    renamed files.
13. Commit.
14. Tag the commit.[2]
15. Branch "lib-modified" to "trunk" (your main development branch).
16. Add the files for your project to trunk.
17. "bzr add" them, and commit.

Ongoing workflows:

a, b, and c as above, except that when merging to trunk, you merge
directly to trunk, not to the lib subtree.

Rationale:

For best results in getting help from your friend and/or the "lib"
upstream project, it's essential to know what changes have been made.
Having a separate branch for each version puts this information at
your fingertips, at a slight cost in overhead for each merge from
upstream.

Plausible variants:

1.  If you already have history in your project, you need to import
    that to bzr in the shared repository after step 1.  I suggest
    renaming that branch to "temporary".  Then instead of initializing
    "lib-upstream", branch it from "temporary" at step 2.  Then remove
    the contents of the lib directory (keeping any local patches, ie,
    those not in your friend's modified version, somewhere safe).  Now
    proceed as in 3--15, and instead of 16 as above, restore your
    local patches in trunk.  Finally, remove the temporary branch.
    (Not relevant to "poor man's nested branches.")

2.  Proceed as in variant 1, then do development in "lib-upstream",
    and merge to "lib-modified" and "trunk" for actual use.  This is
    the best of all worlds from the point of view of communicating
    with upstream; you can test and demonstrate bugs with a pristine
    version.  This only works if your friend's modifications preserve
    enough of lib's API that you can actually run such tests.
    (Not relevant to "poor man's nested branches.")

    Disadvantage: you have to do the "cascade merge" after every
    commit to test the version you are producing.

3.  Use "lib-modified" as "trunk".  Very plausible if you make no or
    "almost" no changes to lib-modified.

    Disadvantage: almost none, as if you decide to make a
    "significant" change to lib-modified, you can always branch at
    that point.  You can also branch "retroactively" if you decide
    several commits later.  (This requires a rebase, and if you run
    into that situation you probably want this list's advice at the
    time.  I'm just noting the possibility here so that you don't
    worry -- it's not going to be a big deal if you need to do it.)

    I just don't like this variant because I'm anal-retentive about
    recording history in separate branches. :-)  I also wonder if it's
    a good idea in "poor man's nested branches", but haven't thought
    it out.

4.  There are also "rebasing" workflows that may make sense if changes
    to lib-upstream or lib-modified become frequent.  bzr is not very
    good at rebasing, though (where in this assessment I include a
    scarcity of documentation on these workflows, and a general anti-
    rebasing philosophy prevailing in the Bazaar community).

    Advantage: better separation of your changes from upstream.

5.  There are also "loom"-based workflows that may make sense if
    changes to lib-upstream or lib-modified become frequent.  Looms
    are not very well documented, at least at the tutorial level, but
    I know at least two heavy users of looms (Robert Collins, who
    maintains the feature in Bazaar, and the FLUFL, who if you know
    what that means you probably know him well enough to ask ;-) who
    might be willing to help.  Looms are much more in keeping with
    Bazaar philosophy than rebase, but the educational resources are
    much poorer.  Ie, there are lots of good texts on rebase on the
    Web.  OTOH, Stacked Git is somewhat like looms, but really looms
    are a unique feature of Bazaar, so there's not that much wisdom
    you can borrow from other communities.

    Advantages: better separation of your changes from upstream, and I
    believe much more automatic than the main workflows described
    above.

5.  An alternative to "looms" is "pipelines".  As the names suggest,
    pipelines are more linear than looms, and I'm not sure they would
    apply to your use case.  Aaron Bentley maintains them, IIRC.  Like
    looms, they are minimally documented.

    Advantages: better separation of your changes from upstream, and I
    believe much more automatic than the main workflows described
    above.

Footnotes: 
[1]  "bzr help init-repo" for more information.

[2]  Not essential, but very useful for error recovery if you put
stuff in the branch that shouldn't be there.

[3]  The problem is that bzr cannot know that the upstream changes
included in your friend's patches are the same ones that you are
bringing in when you update the lib-upstream branch.  This is likely
to result in messy merge conflicts.

[4]  "bzr help checkout", especially the --lightweight option, and
"bzr help switch", for more information.  Also Google the wiki for
"colocated branches" (the archives will have tons on this, but mostly
in the form of feature requests and bzr-vs-git discussions; not very
useful in practical application).