[PLUGIN] bzr changeset now support rollup changesets (and bzr send-changeset)

Wed Jun 29 16:59:31 BST 2005

Martin Pool wrote:

>I think the format looks really good, but as Michael says we can try to
>make it smaller.  I'd be happy to try taking changes in this format.
>
>I think we could shrink the changeset format in a few ways.
>
>* Remove or reduce some of the noise about "use bzr apply-changeset",
>  "BEGIN BZR FOOTER", etc.
>
>
Are you sure you don't want an opening line to indicate at least the
changeset format? It could be as simple as
# Bazaar-ng changeset v0.0.5
Just something that both allows tracking the file version, and a line
for us to key off of. So that if someone pipes a mail it can discard
everything until it finds that line.
I don't know if you like calling it Bazaar-ng, but whatever you like for
the name.

>* Change the data model to make the inventory-id the same by definition
>  as the revision-id, so that it doesn't need to be sent separately.
>
>
As long as that is constant, I have no problem.

>* Get the timestamp and timezone by unpacking the date header (which
>  would need to include fractional seconds to get the right XML.)
>
>
Sure. Timezone + timestamp were for ease of coding, not because they
were strictly necessary. Though again what about rollup changesets
(where each entry has a different timestamp)

>* Similarly for the committer and message.
>
>
The issue here is for rollup changesets. Because conceivably each
revision could be committed by a different person. For completeness, I
just include all of them.
I suppose as an optimization, we could assume they are all the same as
the top-level, and only add in entries where it was different.

>* The base is redundant with the first parent and can be removed.
>
>
As mentioned by Aaron, this is probably not strictly true.

>* File ids and parent ids shouldn't normally need to be transmitted: we
>  know precisely which revision this should be applied to, therefore
>  precisely which filename has what id.  File-ids only need to be
>  given in the changeset for newly added files or directories.
>
>
This assumes that I have exactly the same base that you need. Some
changesets are useful outside of that. Say I have a feature that I
developed on branch A, and we want to apply it to branch Z. We would
need to pull in A, just to get the file ids, so that we could apply it
against Z.

But maybe you don't really care about fuzzy changesets. Where they are
not applied on exactly their parents.

>* I suppose the inventory sha1 doesn't really add anything; if we get
>  it wrong the revision sha1 will be wrong too.  (But see below.)
>
>What do you think, sirs?
>
>If all these were done the random changeset I'm looking at would
>shrink to:
>
>------------
># committer: John Arbash Meinel <john at arbash-meinel.com>
># date: Wed 2005-06-29 01:59:45.974591017 -0500
># message:
>#    Updated so that read_changeset is able to parse the output
>[lots of diffs here]
># revision: john at arbash-meinel.com-20050629065945-14a14a6514d5fa46
># sha1: 6a0b70af88574a0e2b25388c8910a329474e982e
># parents:
>#   john at arbash-meinel.com-20050629011806-7fa2f7368e0dfc1b  bda392119156354edc2b09420d
>------------
>
>Not so bad.
>
>Does this always produce the right sha1 for the inventory and revision?
>If so that's pretty cool, since we're getting byte-for-byte identical
>results of things that are only transmitted in a very abstract form.
>We'll need to think about how to ensure that property is preserved in
>the future when for example the version of bzr receiving the data
>differs from the one sending it, such as by adding an explicit tree root
>to new inventories.
>
>
I just verified that the current verbose format is able to exactly
reproduce the Revision file. You have to actually apply the changeset
and regenerate an inventory in order to find out if the Inventory will
match. But I think it would be possible.
Naturally, Inventory is harder to get right than Revision, but the fact
that serializing into a changeset, and unserializing from it yields the
same text is a good start.

>To get there it seems we would need:
>
>* To use the Canonical XML rules to make sure that any tree produces
>  exactly the same byte stream every time
>
>
Certainly.

>* To develop the code such that a particular revision or inventory
>  always produces the same tree, aside perhaps from major format
>  changes.  So if we added for example properties on files to support
>  permissions or the execute bit they should not affect the XML unless
>  the file actually had properties.
>
>
Meaning if we have properties, they should not show up as empty elements
in the XML?

>We might want at some point to have a new major version of a file
>format.  That could be accomodated by saying that the sha1 of this
>object *in format 1* is 129031; even later software which wouldn't
>normally write that format could still check the hash.
>
>
If you have a version number for the changeset format, that could be
tied to the inventory & revision xml layout.

>Extending that, it would be nice if changesets sent by mail could
>include a gpg signature that covers not the mail but rather the full
>revision when it's reconstructed.  If we can get an exactly identical
>revision xml file that should be straightforward.  I'd suggest to
>simply include that ascii-armoured following the changeset.
>
>
I would tend to add it as:

# revision: john at arbash-meinel.com-20050629065945-14a14a6514d5fa46
# sha1: 6a0b70af88574a0e2b25388c8910a329474e982e
# gpg signature:
#    -----BEGIN PGP SIGNATURE-----
#    Version: GnuPG v1.4.1 (GNU/Linux)
#
#    iD8DBQFCwr6qJdeBCYSNAAMRAvS6AJ95JfFDBQqZbr/4Ly45lgWGodhzxgCgxR8L
#    Xc5aTu9MYL/6ou/yr5+8vfU=
#    =NTE3
#    -----END PGP SIGNATURE-----
#

That could be parsed exactly by the current code (it adds a specific prefix).

Of course, we could also trim the extra gpg info, (and I think because it is base64
encoded, the carriage returns don't matter)

# revision: john at arbash-meinel.com-20050629065945-14a14a6514d5fa46
# sha1: 6a0b70af88574a0e2b25388c8910a329474e982e
# gpg signature: iD8DBQFCwr6qJdeBCYSNAAMRAvS6AJ95JfFDBQqZbr/4Ly45lgWGodhzxgCgxR8LXc5aTu9MYL/6ou/yr5+8vfU==NTE3

It is long, but it would be only one line. Or maybe just
# gpg signature:
#    iD8DBQFCwr6qJdeBCYSNAAMRAvS6AJ95JfFDBQqZbr/4Ly45lgWGodhzxgCgxR8L
#    Xc5aTu9MYL/6ou/yr5+8vfU=
#    =NTE3

>Perhaps this is not realistic but I'd like to try.
>
>
I think it is very reasonable.

>There is of course a risk of mail transports that corrupt their content
>by trimming or wrapping lines, but they're out of scope; if you are
>unfortunate enough to have one the changeset will just have to be
>wrapped.
>
>
>
I don't think there are many mail transports which corrupt attachments,
and changesets could always be done that way. I don't think it is our
problem to make sure that sending text over email doesn't get corrupted.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050629/5bd4bcca/attachment.pgp