Fix a botched log-message

John Arbash Meinel john at arbash-meinel.com
Wed Apr 2 13:43:59 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
| On Tue, Apr 1, 2008 at 6:53 PM, Stefan Monnier <monnier at iro.umontreal.ca> wrote:
|
|>  There should at least be some way to add post-facto annotations of
|>  some sort.
|>
|>  As for the immutability of History, I happen not to agree with it:
|>  History (in contrast to the *past*) is very much mutable and even varies
|>  with the point of view.
|
| We have talked about having a versioned file that gives the
| "corrected" messages or other information about past commits.  It
| probably solves the common case of wanting to correct a long-past
| commit, and it does not complicate the model or introduce different
| versions of past history, or mean that history is massively duplicated
| by rebasing.
|
| This would be moderately easy to prototype as a plugin that decorates
| the repository to return different revision messages...
|

There are some complications that I'm aware of.

1) Do you version this new information or not?

~   a) If you don't how do you propagate it and make sure you are using the
~      "correct" version. We have that problem now with Branch tags. A push/pull
~      overwrites the target tags, even though they might be "newer". We don't
~      have a way to tell, because there is no version history to look at.

~      You can choose to prompt to ask the user what they want to do, but you
~      potentially have to prompt for what would have been obvious. (Only one
~      side made a change, but without history you only know that they differ.)

~   b) If you do version, then you still have a potential for conflicts on a
~      location that doesn't have a direct filesystem representation.
~      You can chose to write out a file for them to edit, or you can have an
~      interactive prompt, etc.  So far, bzr has chosen the non-interactive
~      route. The only command that has interaction is 'bzr uncommit' because it
~      used to remove the data from disk, and thus was a data-losing operation.

~      You also need a way to "bzr resolve" the conflict which doesn't have a
~      natural filesystem path.

~      It is even possible for someone who knows nothing about a given log entry
~      to get a conflict that they then need to resolve. (Because I pulled from
~      Joe who changed it, and then merged from Mary who also changed it, but to
~      something else.)

2) Scalability. You need to know when doing a  fetch/merge/pull which entries
~   you need to propagate. And remember, the whole point is being able to change
~   old revisions, so you can't just propagate the values that point to the newly
~   fetched revisions.

~   a) You can just scan the history looking for any new annotations. However,
~      that scales O(project_history) which we know is bad.

~   b) Slightly better is to just look at the list of modifications that you have
~      (as this is going to grow slower than O(project_history)), however it
~      grows as O(modifications_ever_made) which means it doesn't have an upper
~      bound.

~   c) The best scaling I know is to version the meta-info in another DAG and
~      then use that DAG to decide what needs to be propagated. There are still
~      oddities here, like you may pull from a branch that has no new revisions,
~      but does have new annotations, which now conflict with the target branch.

~   d) You also have to consider the scalability of *using* this information.
~      Specifically when you do "bzr log" for every revision you need to check if
~      there is a potential log modification that you need to use instead of the
~      one you have. This also impacts stuff like propagation, because you need
~      to store the data in such a way that it is fast to retrieve when you need
~      it for log, but also fast to compute what needs to be sent for 'bzr push'.


I don't know of any distributed VCS that lets you modify commit information such
as this. Specifically both Mercurial and Git include the commit message in the
sha1sum that identifies the commit. So modifying it genuinely creates a new
commit. And as their sha1 hashes are chained (the parent has is part of the hash
of the child), then you have the same need to do a modification and then rebase
all of the commits from it, which generates a completely new branch and all of
the associated complications of having people build their work on your work, and
now all of your commits changed. You can have the new commits pointed to by the
old commits, but it still means that modifying a single commit message then
doubles the size of your DAG for all revisions that are children of that node.

Monotone and darcs *might* allow you to modify them. I don't know how they are
stored exactly. But monotone has some scaling issues with propagating changes if
you are dealing with everything as a set. (I believe the use a merkel tree to
work out what needs to be propagated, which is efficient in transmission, but
means that both sides need to *compute* O(everything).)

Arch seemed like it let you change the commit logs, as it stored the patch log
as part of the tree. And it propagated because it was just another file-change.
However, sometimes it would use the local log, and sometimes it would use the
one stored in the repository. And storing the files in that way was *horrible*
for performance. (Doing a 'tla status' had to scale with O(files + revisions)
because it had 1 file for each revision, which might have been changed.)


Anyway, we discussed it quite a bit (at least as part of the 'bzr tags'
discussion). I think we have an idea of how to write it so that it is efficient.
The main reason Martin didn't implement it was 2b. You need a simple way to
communicate to a user that you have a conflict in something that isn't a file on
disk. And it complicates the internal model, because now you have 2 DAGs instead
of just 1 that people need to think about (or at least the internal code does).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH83+PJdeBCYSNAAMRAlwWAJ9crPllw0VZoJP7qiY3+ommfgp2EACdH/rs
sx//yyrGM8fYlOt7n+9tPWg=
=EFPT
-----END PGP SIGNATURE-----




More information about the bazaar mailing list