Understanding pull

Tue Mar 27 00:05:28 BST 2007

John Arbash Meinel wrote:
> Ian Clatworthy wrote:
> ...
>
>   
>> So trying to summarise the various development models and matching
>> Bazaar 'recipes' in my head, I initially found myself at a loss to
>> understand why pull existed at all:
>>
>> 1. central repository model: checkout + update + commit(local or central)
>> 2. distributed repository model: branch + merge + commit(local) + bundle
>> + email to gatekeeper
>>
>> There are excellent reasons for merging from the 'master' code base and
>> retesting before committing. But there are also plenty of times when the
>> best time to resync your working tree is immediately *after* completing
>> one fix before you start on the next. 'pull' would be ok in the former
>> case but almost always fail in the latter case - given the "as long as
>> local changes aren't committed" rule.
>>
>> So what recipes involve pull? Do developers use it commonly in
>> day-to-day development? I can see its applicability when I want a
>> pristine local mirror of a master repository in order to run/test
>> against when reporting bugs, say. By even in that case, what is pull
>> buying me that merge isn't?
>>
>> Ian C.
>>
>> PS: Apologies if the questions above are dumb ones. I am truly impressed
>> by just how flexible and powerful bazaar is. But with that power comes
>> the need for multiple recipes for beginners where previously just one
>> 'sufficed' most of the time. The barrier to entry is low IMHO except in
>> this area: CVS just has 'update' while bazaar has update/merge/pull to
>> choose from.
>>
>>     
>
>
> No apologies needed. While we may not have exact answers all of the
> time, it is questions like this which should be clarified, so that we
> all understand things a bit better. (It is easy to get used to things,
> and lose track of what it all means).
>
> Having written this, it seems like it should be a FAQ or Wiki, or
> something that we can refer people to.
>
> So as far as "pull" versus "merge"...
>
> One thing bzr lets you do, is maintain a "branch". Which acknowledges a
> difference between commits that you have merged, and ones that you
> generated. We frequently refer to these as "merged" versus "mainline".
>
> One way to see this difference is to do "bzr log --long" on a bzr.dev
> tree, versus "bzr log --short". --long will show you all of the merged
> revisions, while --short only shows you the mainline, which is generally
> a summary of what was merged into bzr.dev.
>
> I don't know about other people, but I *really* like 'bzr log -r -10..-1
> --short --forward', enough so that I've aliased it to "bzr log". It
> gives me a nice summary of the last 10 changes on a branch. And usually
> that summary fills about 1 screen-full.
>
> So why the distinction between a mainline and a merge commit. We have a
> few use cases for them...
>
> 1) "These are the patches reviewed by me". I don't review every single
> change someone makes. But I *do* review the merge before I commit it.
>
> 2) "Every commit on the mainline passes the test suite". This is a
> pretty big one for our bzr.dev process.
>
> 3) "Summary of changes". It gives an obvious place to summarize the many
> (potentially hundreds or more) commits someone made. You frequently want
> to keep all of those hundreds of revisions around, because it gives you
> nice, fine grained details about things that have changed. (Useful for
> annotate, or any sort of digging that needs to be done).
>
> But having a single summary revision, also lets the people who *don't*
> want to wade through 100 revisions to understand "Implemented bound
> branches".
>
> 'bzr pull' is generally a statement of "I want an exact copy of the
> other branch", versus 'bzr merge' "I want to include the changes from
> that other branch".
>
>
> There are people who don't really care about maintaining a mainline, or
> about the summary commits. Which is why we have "bzr merge --pull".
> Which is a statement of "I just want those revisions, pull if you can,
> if not merge, and I'll commit". That is actually (AFAIK) the only real
> workflow that 'git' lets you use. Since it's merge is always a
> fast-forward if possible. Also, git doesn't seem to prefer the "merge,
> review, commit" workflow. Because 'merge' automatically updates your
> branch history (aka commits the changes). Their workflow is
> "merge+commit; maybe review and commit --amend".
>
>
> I hope I've made clear why we at least need 2 commands, so that users
> can give bzr a hint of what their intentions are. And bzr can try to
> chose the best strategy. ('bzr pull' fails if the branches have
> diverged, because we *can't* make an exact copy. People have argued that
> it should fall back to 'merge', but they don't realize the potential
> problems with uncommitted changes.)
>
>
> So, on to why we have 'bzr pull' versus 'bzr update'.
>
> #1 reason... hysterical raisins. (historical reasons). 'bzr pull'
> existed long before 'bzr update' did. Because 'update' really only does
> something useful when you have a checkout (bound branch). And it wasn't
> until even more recently that you could have a checkout of a readonly
> location. So if you wanted to mirror
> 'http://bazaar-vcs.org/bzr/bzr.dev', you had to use 'branch' and 'pull'.
>
> Within the last few versions (0.11 according to NEWS, and 'bzr checkout'
> itself is 0.8). We now have the ability to do a checkout of a readonly
> url. And I've actually switched all of my mirrors of 'bzr.dev' over to a
> checkout. (I have one on every machine that I use, and it is my primary
> 'bzr' command).
>
> I do this because it makes it *very* clear that it is only meant as a
> mirror. So I cannot merge or commit in that branch. Which means I *know*
> it is always an exact mirror of the upstream. (Barring local uncommitted
> changes). So 'bzr checkout' + 'bzr update', could very well take the
> place of 'bzr branch' + 'bzr pull'.
>
>
> There is at least one case that it fails for:
>
> Mirroring known public branches.
>
> Now that we have the bazaar.launchpad.net mirroring branches, I don't
> try as hard, but at one point, any branch that someone mentioned I would
> add to:
> http://bzr.arbash-meinel.com/mirrors/
>
> There were a few goals. One was to be able to see if people are making
> changes (I have a cron script to update it a couple times a day, and
> email me any changes). Another was just to have a mirror in case hosts
> disappear. And the third was to have those revisions locally. At one
> point 'pull' was much more expensive (weaves), so that allowed me to
> have an automated script do the downloading, rather than having me wait
> for it.
>
> I still use it in case things disappear (especially for plugins). Though
> I try to recommend people register branches on LP, so that I don't have
> to spend my bandwidth updating mirrors.
>
> Anyway, right now 'bzr update' and 'bzr checkout' only works on working
> trees. We have no way in the UI to update a bound branch that doesn't
> have a working tree. (bzrlib does it fine, and my 'update-mirrors'
> plugin does just that). So I can do "bzr checkout FOO; bzr remove-tree
> FOO" And I have a bound branch but no way to update it. (pull might
> work, but logically it could also fail because you are trying to update
> a readonly branch unless we special cased a 'pull' from the master branch).
>
>
> Some interesting case studies... I have 126 mirrors of just bzr
> branches. If each of those had working trees, it would be about
> 8*126=1GB of disk space. If we didn't have shared repositories, it would
> be 53*126=6.6GB of space. Though I have 211 of my own bzr branches (no
> mirrors). So I can say that shared repositories with no working trees is
> very important to me. Rather than taking approx 60*337=20GB, I'm using
> about 150 MB (I have a couple repositories, and some standalone branches
> in there).
>
> John
> =:->
>
>   
John,

Thanks for taking the time to explain this all in so much depth. I'm yet 
to digest it all but it's exactly the sort of info and comparison 
between choices I was looking for.

Ian C.