[MERGE] pybaz: sanitize patch logs for more tla quirks

Tue Apr 25 10:19:49 BST 2006

On Tue, 2006-04-25 at 08:28 +1000, Robert Collins wrote:
> On Mon, 2006-04-24 at 16:54 +0200, David Allouche wrote:
> 
> > I think fixing that problem is more involved than that.
> 
> ouch.
> 
> > The problem is that tla (or baz) just sticks whatever it thinks is a
> > header into the log message, without any sanity checking. So you can
> > give "Revision" or, in particular "Creator", "Standard-date"
> > "New-patches", "Summary". I was able to create the following patchlog
> > with baz-1.4.2:
> 
> Wow, bustification. What about new-patches? does it let you override
> that ?

I does not let you _override_ anything. But it does let you stick in
anything you want. Yes, you can add a duplicate "New-patches" header.

> > As you can see, larch put New-patches _after_ Summary, while baz puts it
> > before. So we need to be smarter.
> 
> Not necessarily. Summary was always a user header, so if it gets
> bustified, I think we can survive.

What do you mean?

I discussed the position of Summary, because I expected it to be always
present (even though its presence is not actually required), and
therefore useful to distinguish user-generated and machine-generated
headers.

> baz and tla are by far the most
> common arch clients around, so I would follow whatever they do.

That does not tell us how to resolve possible ambiguities in a reliable
and consistent way.

> I think you are solving a different bug:)
> 
> My bug was 'arch puts things that are -not- rfc2822 headers in the
> header section of the message, which fucks over the python email
> module'. This bug will cause imports to fail because important headers
> are not accessible.
> 
> Your bug is 'arch trusts user headers far too much'. This bug can cause
> imports to differ when run by two different users. This will cause
> consistency errors down the track but no immediate failure [except when
> a user broke the format of the data pybaz wants in one of the
> needed-for-import headers].

Fair enough.

But the bad-rfc2822 bug (is that actually that, and not just rfc822?) is
connected to the ambiguous headers bug, because the fix you proposed can
create new ambiguous cases, and therefore makes it more likely to hit
the ambiguous headers problem. So, fixing one should involve fixing the
other.

I think I can propose a reasonable solution to the problem, the desired
properties of the solution are:

     1. NoRegression: do not regress from existing behaviour
     2. Consistency: give consistent output
     3. HandleInvalid: do not blow in the presence of invalid headers
     4. FutureProof: can evolve heuristic header matching without
        causing regressions

It is not strictly possible to satisfy NoRegression without studying the
historic behaviour of the Python email parser, and then it might turn
out to conflict with Consistency. I am not prepared to that unless that
appears to be absolutely necessary. So I invoke the Python Zen and
refuse to guess in the face of ambiguity.

Here is an updated proposal:

      * fix "bad header:" into "invalid-bad-header:"
      * decorate "good-header:" into "valid-good-header:"
      * only match with headers decorated as "valid".
      * raise if multiple matching headers are found

What do you think?
-- 
                                                            -- ddaa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060425/0f8b60a3/attachment.pgp