[MERGE] pybaz: sanitize patch logs for more tla quirks

Robert Collins robertc at robertcollins.net
Mon Apr 24 23:28:23 BST 2006


On Mon, 2006-04-24 at 16:54 +0200, David Allouche wrote:

> I think fixing that problem is more involved than that.

ouch.

> The problem is that tla (or baz) just sticks whatever it thinks is a
> header into the log message, without any sanity checking. So you can
> give "Revision" or, in particular "Creator", "Standard-date"
> "New-patches", "Summary". I was able to create the following patchlog
> with baz-1.4.2:

Wow, bustification. What about new-patches? does it let you override
that ?

> Revision: foo--bar--0--patch-3
> Archive: archive at example.com
> Creator: Foo <foo at example.com>
> Date: Mon Apr 24 16:28:24 CEST 2006
> Standard-date: 2006-04-24 14:28:24 GMT
> New-patches: archive at example.com/foo--bar--0--patch-3
> Summary: summmary
> Keywords: 
> Summary: other summary!
> Creator: foo
> Revision: foo
> Standard-date: foo
> 
> All those fields are critical to baz_import. It looks like the email
> parser in Python 2.4.3 will give the first header matching the given
> name when using email.Message.__getitem__ with a key that matches
> multiple headers, however the API documentation reads:
> 
> http://docs.python.org/lib/module-email.Message.html#l2h-3842
> 
>         Note that if the named field appears more than once in the
>         message's headers, exactly which of those field values will be
>         returned is undefined. Use the get_all() method to get the
>         values of all the extant named headers.
> 
> Undefined, IOW, what pybaz will give is undefined, and baz_import
> behaviour is undefined as well.
> 
> In itself, that would not be terribly serious, because we could use
> Message.items() to get the list of all headers and pick the first or
> last matching header, to be sure to get what was indeed generated.
> 
> The problem is that various versions of Arch seem to disagree on the
> relative ordering some user-supplied and generated headers. For example,
> on of my oldest commits (created by larch) has the following headers:
> 
> Revision: archtools--ddaa--1.0--patch-1
> Archive: david at allouche.net--2003
> Creator: David Allouche <david at allouche.net>
> Date: Thu May 15 15:01:28 CEST 2003
> Standard-date: 2003-05-15 13:01:28 GMT
> Summary: remove uneeded quoting around process expansions
> Keywords: 
> New-files:
> {arch}/archtools/archtools--ddaa/archtools--ddaa--1.0/david at allouche.net--2003/patch-log/patch-1
> Modified-files: append-tag larch-cherrypick larch-mv lib/tempfiles.sh
> patchmon
> New-patches: david at allouche.net--2003/archtools--ddaa--1.0--patch-1
> 
> As you can see, larch put New-patches _after_ Summary, while baz puts it
> before. So we need to be smarter.

Not necessarily. Summary was always a user header, so if it gets
bustified, I think we can survive. baz and tla are by far the most
common arch clients around, so I would follow whatever they do.

> We could use the following logic for header parsing:
> 
>       * use Message.items() and look manually in the list for the named
>         header
>       * Loop until the named header is found, or until the first Summary
>         header is found.
>       * If Summary is found first, start looking for the header from the
>         end of the list
> 
> That assumes that Summary is always the first user-provided header.
> Unfortunately baz does not enforce that (so I expect other
> implementations of Arch did not either). So it's just relying on the
> deeply ingrained habit of Arch users to always put the summary header
> first.
> 
> Another, more conservative, option would be to modify pybaz.Patchlog to
> throw if more than one matching header is found. That might be a
> regression as some imports that worked by chance will stop working, but
> that gives a reasonable guarantee that successful imports will be
> reproducible.
> 
> Finally, maybe those broken headers should be converted into a separate
> namespace, eg "X-pybaz-<broken-header>", to reduces the odds of
> undesired conflict with a Arch header.
> 
> Your thoughts?

I think you are solving a different bug:)

My bug was 'arch puts things that are -not- rfc2822 headers in the
header section of the message, which fucks over the python email
module'. This bug will cause imports to fail because important headers
are not accessible.

Your bug is 'arch trusts user headers far too much'. This bug can cause
imports to differ when run by two different users. This will cause
consistency errors down the track but no immediate failure [except when
a user broke the format of the data pybaz wants in one of the
needed-for-import headers].

Rob



-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060425/9f085461/attachment.pgp 


More information about the bazaar mailing list