DOC: Revsed bundle format 4 & merge directive format 2

Mon Jun 25 17:03:14 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
> I've updated the spec
> 1. to describe both bundles and merge directives, and how they relate to
>    bundle 0.9 format
> 2. to include model info in the info record
> 3. to split "header" and "body" into two separate container records,
>    instead of doing my own splitting within records.
> 4. to specify that format 4 bundles are always bzipped, but the base64
>    encoding comes from the merge directive.
> 5. to change the naming format to be uniformly slash-delimited.
> 6. to remove testament records
> 7. to describe how to handle model differences
> 
> Aaron

I think we might want a little bit more "why" in the document, rather than just
"what". For stuff like this:

> When merging, clients should use the ``base_revision_id`` when it is not
> already present in the ancestry of the ``last_revision`` of the target branch.
> If it is already present, clients should calculate a merge base in the normal
> way.

I believe that is because you want a way for the client has a way to specify a
"cherrypick". But that hasn't been clearly stated.

You do describe it a bit later in "Bundle Section", so maybe just a crossref
for the further info.

It would be nice to have the bundle format be a little more agnostic towards
the diff algorithm (you explicitly require it to be mpdiff). But then again, we
can start with format 4 being mpdiff and see if we want to change from that in
the future.

Another thing that we discussed, and I think would be reasonable. To have a
sha1sum for a whitespace nuked version of the human readable prelude, and the
sha1sum for the "correct" version (as would be generated by the bundle).

At that point, when reading the prelude we can easily generate 2 sha sums of
*just* the prelude, which gives us an idea if whitespace munging has happened,
and how much we should worry about it.

At least a quick pass in my head would be to generate the sha1sum for the patch
as it is, and then the patch with all trailing whitespace removed. Maybe
stripping or replacing leading tabs with plain spaces or something else? I'm
hesitant to just strip leading whitespace, since in code like python that
changes the actual meaning because you can change a line from being inside an
if to being outside, which can certainly be dangerous. Something like:

if geteuid() != 0:
  x = 10
  Dangerous if you are root

Versus
if geteuid() != 0:
  x = 10
Dangerous if you are root

The reason to include these sha1 sums is so that you can quickly verify the
patch itself without having to do anything with the bundle. But that would be
optional.

I'm not sure how to verify the patch from the bundle texts, though. Here are my
issues:

1) The "correct" way is to apply the patch to the source, and make sure that
you get the exact sha1 sum for the target text. Which works fine as long as
there is no munging.

2) A possible way is to generate the target from the bundle, and then create
your own diff, and see if it matches the ones supplied. Possibly munging your
patch to see if it matches. But:

   a) That falls prey to the above attack vector. (The human readable shows it
      indented, the final is not, and the munging shows it as matching.)

   b) It also has problems with just a slightly different diff implementation.
      Because lines may sync up slightly differently.

So another possibility would be to strip all trailing whitespace from the
failing prelude, and apply it to the source. And then take the target from the
bundle, and strip the trailing whitespace for only those lines referenced in
the patch, and then see if the two match.

Overall, I see it being pretty difficult to validate a munged patch. Such that
you don't introduce an attack vector. We could always do
  ERROR: Prelude diff does not match the Bundle data.
  This could be caused by mail clients munging the whitespace.
  Use --ignore-prelude to merge.

Or we could downgrade it to a warning:

  WARNING: Prelude diff does not match the Bundle data.
  This could be caused by a mail client changing whitespace.
  Please review the changes closely using 'bzr diff' to
  ensure that the changes are what you want to merge.

My idea is that the repository data is still validated (through the bundle
installation). So if you are doing 'bzr merge' you just reject the revisions
for now. 'bzr pull' is a bit trickier, because you probably shouldn't move the tip.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGf+dCJdeBCYSNAAMRAioCAJ4vzzKVyupDggsoVJ81cKfEzIx/PgCghylf
NKlbuIV0wqKkD1rt/NHh33A=
=/vit
-----END PGP SIGNATURE-----