[MERGE] pybaz: sanitize patch logs for more tla quirks

David Allouche david at allouche.net
Mon Apr 24 15:54:04 BST 2006


On Mon, 2006-04-24 at 22:14 +1000, Robert Collins wrote:
> > Revision: foo--0--patch-1
> > Archive: foo at example.com
> > Creator: Robert Collins <rbtcollins at hotmail.com>
> > Date: ...
> > Standard-date: ...
> > Summary: ...
> > Keywords:
> > Patches applied:
> > New-files: ...
> > New-directories: ...
> > Modified-files: ...
> > New-patches: ...
> > 
> > Message here

I think fixing that problem is more involved than that.

The problem is that tla (or baz) just sticks whatever it thinks is a
header into the log message, without any sanity checking. So you can
give "Revision" or, in particular "Creator", "Standard-date"
"New-patches", "Summary". I was able to create the following patchlog
with baz-1.4.2:

Revision: foo--bar--0--patch-3
Archive: archive at example.com
Creator: Foo <foo at example.com>
Date: Mon Apr 24 16:28:24 CEST 2006
Standard-date: 2006-04-24 14:28:24 GMT
New-patches: archive at example.com/foo--bar--0--patch-3
Summary: summmary
Keywords: 
Summary: other summary!
Creator: foo
Revision: foo
Standard-date: foo

All those fields are critical to baz_import. It looks like the email
parser in Python 2.4.3 will give the first header matching the given
name when using email.Message.__getitem__ with a key that matches
multiple headers, however the API documentation reads:

http://docs.python.org/lib/module-email.Message.html#l2h-3842

        Note that if the named field appears more than once in the
        message's headers, exactly which of those field values will be
        returned is undefined. Use the get_all() method to get the
        values of all the extant named headers.

Undefined, IOW, what pybaz will give is undefined, and baz_import
behaviour is undefined as well.

In itself, that would not be terribly serious, because we could use
Message.items() to get the list of all headers and pick the first or
last matching header, to be sure to get what was indeed generated.

The problem is that various versions of Arch seem to disagree on the
relative ordering some user-supplied and generated headers. For example,
on of my oldest commits (created by larch) has the following headers:

Revision: archtools--ddaa--1.0--patch-1
Archive: david at allouche.net--2003
Creator: David Allouche <david at allouche.net>
Date: Thu May 15 15:01:28 CEST 2003
Standard-date: 2003-05-15 13:01:28 GMT
Summary: remove uneeded quoting around process expansions
Keywords: 
New-files:
{arch}/archtools/archtools--ddaa/archtools--ddaa--1.0/david at allouche.net--2003/patch-log/patch-1
Modified-files: append-tag larch-cherrypick larch-mv lib/tempfiles.sh
patchmon
New-patches: david at allouche.net--2003/archtools--ddaa--1.0--patch-1

As you can see, larch put New-patches _after_ Summary, while baz puts it
before. So we need to be smarter.

We could use the following logic for header parsing:

      * use Message.items() and look manually in the list for the named
        header
      * Loop until the named header is found, or until the first Summary
        header is found.
      * If Summary is found first, start looking for the header from the
        end of the list

That assumes that Summary is always the first user-provided header.
Unfortunately baz does not enforce that (so I expect other
implementations of Arch did not either). So it's just relying on the
deeply ingrained habit of Arch users to always put the summary header
first.

Another, more conservative, option would be to modify pybaz.Patchlog to
throw if more than one matching header is found. That might be a
regression as some imports that worked by chance will stop working, but
that gives a reasonable guarantee that successful imports will be
reproducible.

Finally, maybe those broken headers should be converted into a separate
namespace, eg "X-pybaz-<broken-header>", to reduces the odds of
undesired conflict with a Arch header.

Your thoughts?
-- 
                                                            -- ddaa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060424/cc1bcad9/attachment.pgp 


More information about the bazaar mailing list