[MERGE] pybaz: sanitize patch logs for more tla quirks
David Allouche
david at allouche.net
Tue Apr 25 13:56:10 BST 2006
On Tue, 2006-04-25 at 20:26 +1000, Anand Kumria wrote:
> I think this is a highly complex solution that isn't neccessary.
I agree, that's annoyingly complex (esp. since I'm going to have to
implement it) but I cannot think of anything simpler that is still
correct.
Something simpler could addresses all my non-regression concerns: just
ignore invalid headers.
Independently, raise on multiple matching headers because we cannot
decide what to return, and it's critical to bzr data consistency to be
consistent with what other baz-import runs would have produced.
Robert how does that one sound to you?
> Particularly as this isn't the common case.
Consistent behaviour in uncommon cases is just as important as
consistent behaviour in common cases. Especially, since VCS users have
an uncanny ability to explore all the possible ways to break their
tools.
> Additionally, you shouldn't be raising on multiple matching headers as,
> if you are trying to follow rfc2822, that isn't an error. It is just
> implementation defined whether you pick first or last match (generally
> first match is recommended for email, since it is typically inserted by
> a human).
We are not dealing with email, nor really dealing with rfc2822 for that
matter. We are dealing with Arch patchlogs, which happen to be almost,
but not quite, entirely different from rfc[2]822 messages. The punning
on rfc822 syntax allows us to use the Python email parsing facilities,
but it's little more than punning.
In Arch patchlogs, there is a set of machine-generated headers which are
always present, which should be distinct from the set of human-generated
headers, and should be reliably retrievable. Maybe pybaz should deal
with anything and pick the first match, but historical data (which is
all VCS are about) does not allow us to do that and reliably retrieve
the machine-generated New-patches header.
Now, there's an argument that we should allow some headers to have
duplicates, but that would be even more complicated, so I am willing to
ignore that (esp. since the current behaviour is undefined) until a
clear need is identified and a good patch is provided.
--
-- ddaa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060425/cd22762f/attachment.pgp
More information about the bazaar
mailing list