[RFC] history editing vs history presentation

Tue Jun 9 07:12:28 BST 2009

So, I wanted to make several different RFC's to get discussion going
about this. But I think the issues are not sufficiently teased out yet.
In part, I hope to tease out the issues here and perhaps end up with
some suggestions about what we should aim to have to provide a good mix
of effective tools.

I'm going to skip lots of references to other systems in the meat of
this document: they are useful for figuring out what has been done, and
what people have liked, and for inspiration but they are not useful for
explaining things to our users! It should be clear though that there is
lots of prior art surrounding the things I put forward: darcs, looms,
topgit, queues, quilt, for instance.

History editing
+++++++++++++++

Sometimes, at arbitrary times after something is committed to history,
users realise they want a variation of the content and history that
can't be achieved by simply appending a corrective change to the end.

This is implemented in bzr and some other modern control systems by
creating *new* history with the desired properties and then [sometimes
optionally] removing the old history, or just making it unreferenced and
letting garbage collection take care of it. So the summary is - take the
old history, apply an edit to it to get a new history, and then, when
desired, discard the old history.

While users want to do these operations, they can be in conflict with
other constraints placed on bzr - keeping small history sizes, making
collaboration and reviews of changes easy and so on.

One of the key side effects of history editing is that edited history
cannot be trivially merged from: it acts as a brand new creation.

Some motivations for this turn up frequently:

Splitting or joining code
-------------------------

VCS systems require users to make fairly arbitrary divisions between
code they manage, and code that isn't managed. Often users make
decisions that later experience shows would be better made differently.

A subset of the changes introduced are wanted - e.g. a slice of some
sort through time. This sort of edit is sometimes very simple, and
sometimes very complex - consider detangling a library which was in a
subdir but wasn't perfectly isolated. Joining code is similar but in
reverse. Note that doing a merge between two projects to get their
history to join is not the same thing as editing the history so they
appear to have had a single origin: In the latter case you end up with
one history root, whereas in the former case you have two and there is
no clear version of the subproject to use at any given point in the
upper project.

Tweaking
--------

Relatively minor details need to be changed: Authors or committers
altered, encoding of commit messages changed, typos and offensive
language (for example) cleaned up.

History polishing
-----------------

The overall changes made by some set of commits can be built up in a
more structured way that makes the history more accessible. Some
examples are:
* reordering commits
* splitting commits
* combining commits
* altering commits

Cherrypicking
-------------

Cherrypick, or non-transitive merges can be considered a history edit -
the implementation requirements are very similar, and the impact on
merging is equivalent.

Constraints
+++++++++++

The more reporting needed from a VCS, the more constraints are needed on
its behaviour:

Releases in VCS's
=================

When using a VCS for release management, and tasks for which releases
matter, such as bug bisection, it is important that a VCS be able to
give an identical tree state to an older point-in-time. Without this,
verifying when a bug was actually introduced, or that someone actually
has an older release can be very hard. Accordingly, once something is
'released' in a VCS, the VCS should not permit it to be altered. Note
that different labels could be applied - and an operation where internal
VCS metadata is replaced but the same log messages and tree content are
preserved is not an alteration that matters for this case.

Integration
===========

Leading up to releases, code changes are accumulated in a branch [or
branches]. For some projects, which never do releases, this is the end
state for code changes. For projects that do 'releases', the contents of
a series could be much more mutable with no ill effects. For other
projects a similar stability as that required for release-in-VCS using
projects is desired.

Patch management
================

Code changes are often reviewed in discrete chunks. We can call such a
chunk a 'patch'. Even when a reviewer is given an entire branch to
review, they usually look at the aggregate merge - a single patch. 
Effectively, as developers put a branch together, including merging
changes from other branches, they are editing the patch to be reviewed.

Good user interfaces for history polishing would go a long way to
delivering good patch management facilities (with one commit per patch),
but would not nicely support collaborative patch development due to the
side effects of history editing.

Beyond the history editing aspects though, users will need to be able to
enumerate patches, see whether they are applied to the current tree or
not, and so on.

What context are what edits needed in?
++++++++++++++++++++++++++++++++++++++

So we have three broad sets of edits:
a)splitting/joining
b)tweaking
c)polishing
d)cherrypicking

And three main states that a given part of history can be in:
1)released
2)integrated
3)pending-integration

I think that collaboration happens in all three states. Distros
collaborate around released code (by creating patches to it and pulling
in select bits of integrated code), project developers collaborate
around mainly integrated code (by creating patches), and in some
circumstances - large branches, risky changes, big projects - subsets of
project developers will collaborate on pending-integration code.

' ' - relatively little or no use 
'.' - some use
'X' - relatively frequent use

[Data for the below needed, ways of getting it solicited]
Frequency of need for history editing based on state of code:
 abcd
1X. X
2...X
3 .X.

Beyond this, I assert with no real evidence that most of these
activities are very isolated due to the social/technical impact of doing
them: as soon as some code becomes integrated nearly all history
polishing will stop; as soon as integrated code is released, the
threshold for which a history tweak is acceptable will go up
significantly.

Existing bzr facilities
+++++++++++++++++++++++

This document has tried to layout some terms for talking about the
operations that are possible in high level terms without bogging down in
the precise capabilities of current tools. But its useful to see what
bzr currently offers

bzr has no builtin or readily available facility for splitting/joining
of history. [See the topic for details on why 'bzr join' does not
count.]

bzr has no mechanism for performing tweak-edits either.

bzr has a rebase plugin that can perform automated replays (a series of
cherrypicks).

The bzr-loom plugin provides some patch management facilities *for
multiple developers editing the same patches* without doing history
editing, but doesn't offer a complete set of polishing primitives,
limiting its usefulness. A good set of history polishing tools built
into the core of bzr wouldn't eradicate bzr-loom, but they would mean
that loom is much less needed for the very common case of a single
developer polishing their own submissions.

A call to arms
++++++++++++++

The default bzr facilities should be enough to remove the current
cognitive dissonance contributors have: they think in terms of patches,
but bzr works in terms of tree states.

To do this, we should build up a complete set of history editing tools -
in particular the patch management primitives. We should make sure that
they have clear warnings, refuse to operate on released or integrated
code without overrides, and are easy and robust.

And then the default work area bzr provides should be extended to have
enough tracking data to make working on patches wonderful again.
bzr-loom demonstrates one possible mode - making that fantastic for
single developers and only slightly more complex for developers
collaborating on the same patches is one approach that would leverage a
bunch of existing work. However, bzr-loom would need its UI to be made
more 'thread'(patch) centric than it is to achieve this What we need to
ensure though, is that we aim for having one system that scales from
single developer editing their own patches, up to a group of folk
editing a set of patches - which the landing of brisbane core would have
been massively improved by.

-Rob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090609/6e036fb4/attachment.pgp