[PLUGIN] bzr changeset now support rollup changesets (and bzr send-changeset)

Martin Pool mbp at sourcefrog.net
Wed Jun 29 11:06:08 BST 2005

I think the format looks really good, but as Michael says we can try to
make it smaller.  I'd be happy to try taking changes in this format.

I think we could shrink the changeset format in a few ways.  

* Remove or reduce some of the noise about "use bzr apply-changeset",

* Change the data model to make the inventory-id the same by definition
  as the revision-id, so that it doesn't need to be sent separately.

* Get the timestamp and timezone by unpacking the date header (which
  would need to include fractional seconds to get the right XML.)

* Similarly for the committer and message.

* The base is redundant with the first parent and can be removed. 

* File ids and parent ids shouldn't normally need to be transmitted: we
  know precisely which revision this should be applied to, therefore
  precisely which filename has what id.  File-ids only need to be
  given in the changeset for newly added files or directories.

* I suppose the inventory sha1 doesn't really add anything; if we get
  it wrong the revision sha1 will be wrong too.  (But see below.)

What do you think, sirs?

If all these were done the random changeset I'm looking at would
shrink to:

# committer: John Arbash Meinel <john at arbash-meinel.com>
# date: Wed 2005-06-29 01:59:45.974591017 -0500
# message:
#    Updated so that read_changeset is able to parse the output
[lots of diffs here]
# revision: john at arbash-meinel.com-20050629065945-14a14a6514d5fa46
# sha1: 6a0b70af88574a0e2b25388c8910a329474e982e
# parents:
#   john at arbash-meinel.com-20050629011806-7fa2f7368e0dfc1b  bda392119156354edc2b09420d

Not so bad.

Does this always produce the right sha1 for the inventory and revision?
If so that's pretty cool, since we're getting byte-for-byte identical
results of things that are only transmitted in a very abstract form.
We'll need to think about how to ensure that property is preserved in
the future when for example the version of bzr receiving the data
differs from the one sending it, such as by adding an explicit tree root
to new inventories.

To get there it seems we would need:

* To use the Canonical XML rules to make sure that any tree produces
  exactly the same byte stream every time

* To develop the code such that a particular revision or inventory
  always produces the same tree, aside perhaps from major format
  changes.  So if we added for example properties on files to support
  permissions or the execute bit they should not affect the XML unless
  the file actually had properties.

We might want at some point to have a new major version of a file
format.  That could be accomodated by saying that the sha1 of this
object *in format 1* is 129031; even later software which wouldn't
normally write that format could still check the hash.

Extending that, it would be nice if changesets sent by mail could
include a gpg signature that covers not the mail but rather the full
revision when it's reconstructed.  If we can get an exactly identical
revision xml file that should be straightforward.  I'd suggest to
simply include that ascii-armoured following the changeset.

Perhaps this is not realistic but I'd like to try.

There is of course a risk of mail transports that corrupt their content
by trimming or wrapping lines, but they're out of scope; if you are
unfortunate enough to have one the changeset will just have to be


More information about the bazaar mailing list