DOC: Revsed bundle format 4 & merge directive format 2

Mon Jun 25 17:37:54 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> I think we might want a little bit more "why" in the document, rather than just
> "what". For stuff like this:
> 
>> When merging, clients should use the ``base_revision_id`` when it is not
>> already present in the ancestry of the ``last_revision`` of the target branch.
>> If it is already present, clients should calculate a merge base in the normal
>> way.
> 
> 
> I believe that is because you want a way for the client has a way to specify a
> "cherrypick". But that hasn't been clearly stated.

Sure.  That kind of thing is easy to overlook.

> It would be nice to have the bundle format be a little more agnostic towards
> the diff algorithm (you explicitly require it to be mpdiff).

I don't think it would be.  That would require implementations to
support multiple diff algorithms, and I don't think that's productive at
the moment.  I'm quite happy to support xdelta or whatever else is
appropriate in the next format, perhaps dropping support for mpdiff then.

> Another thing that we discussed, and I think would be reasonable. To have a
> sha1sum for a whitespace nuked version of the human readable prelude, and the
> sha1sum for the "correct" version (as would be generated by the bundle).

Why would this be useful?  I agree that we should verify the prelude,
but that can be done by generating a diff from the bundle,
whitespace-munging it, and comparing it to the prelude.

*not* putting the whitespace-nuked sha1 in there means we can freely
change our whitespace-nuking functionality later, as new munging
patterns become apparent.

> At that point, when reading the prelude we can easily generate 2 sha sums of
> *just* the prelude, which gives us an idea if whitespace munging has happened,
> and how much we should worry about it.

Why worry?

> At least a quick pass in my head would be to generate the sha1sum for the patch
> as it is, and then the patch with all trailing whitespace removed. Maybe
> stripping or replacing leading tabs with plain spaces or something else? I'm
> hesitant to just strip leading whitespace, since in code like python that
> changes the actual meaning because you can change a line from being inside an
> if to being outside, which can certainly be dangerous.

I think we should limit ourselves to whitespace munging we have seen in
the wild.  To me, that means stripping trailing whitespace and doing
line ending conversions.

> I'm not sure how to verify the patch from the bundle texts, though. Here are my
> issues:
> 
> 
> 1) The "correct" way is to apply the patch to the source, and make sure that
> you get the exact sha1 sum for the target text. Which works fine as long as
> there is no munging.

Well, that can't be correct, since we're assuming munging.

> 2) A possible way is to generate the target from the bundle, and then create
> your own diff, and see if it matches the ones supplied. Possibly munging your
> patch to see if it matches. But:
> 
>    a) That falls prey to the above attack vector. (The human readable shows it
>       indented, the final is not, and the munging shows it as matching.)

It depends on how approximate you want the comparison to be.  I suppose
it could even be a user option.  And I think stripping leading
whitespace should not be default behavior.

>    b) It also has problems with just a slightly different diff implementation.
>       Because lines may sync up slightly differently.

I think that, within our ecosystem, this is pretty unlikely.  Just use
the same diff implementation.  We can specify it in the spec if you feel
that's important.

> So another possibility would be to strip all trailing whitespace from the
> failing prelude, and apply it to the source. And then take the target from the
> bundle, and strip the trailing whitespace for only those lines referenced in
> the patch, and then see if the two match.

> Overall, I see it being pretty difficult to validate a munged patch. Such that
> you don't introduce an attack vector.

Theoretically, I agree.  I just think that it's pretty hard to exploit
the stripping of trailing whitespace and line-ending conversion to
produce an actual attack.

But importantly, humans can't see trailing whitespace.  They can't
distinguish between a prelude that contains a trailing-whitespace attack
and one that doesn't.  If the purpose of preludes is review, then we can
promise that the supplied bundle matches what they saw, even if it
doesn't match the prelude contents.  That doesn't open up any *new*
attacks, since we are already vulnerable to trailing-whitespace attacks.

>  We could always do
>   ERROR: Prelude diff does not match the Bundle data.
>   This could be caused by mail clients munging the whitespace.
>   Use --ignore-prelude to merge.
> 
> Or we could downgrade it to a warning:
> 
>   WARNING: Prelude diff does not match the Bundle data.
>   This could be caused by a mail client changing whitespace.
>   Please review the changes closely using 'bzr diff' to
>   ensure that the changes are what you want to merge.

I think it would be okay to only warn for merge, since they must commit
for damage to be done.

> My idea is that the repository data is still validated (through the bundle
> installation). So if you are doing 'bzr merge' you just reject the revisions
> for now. 'bzr pull' is a bit trickier, because you probably shouldn't move the tip.

Yeah, we could have "pull --ignore-whitespace-discrepancies"

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGf+9i0F+nu1YWqI0RAjXLAJwL1WUPiyi7gXQpvarQiG+eUDrXzgCeJBoa
N5rtwziO4yX9VKRXF68FM7k=
=EUki
-----END PGP SIGNATURE-----