PQM mis-encoding subject line

Robey Pointer robey at lag.net
Thu Jun 29 03:14:37 BST 2006


On 28 Jun 2006, at 6:02, John Arbash Meinel wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Olivier Grisel wrote:
>> Robey Pointer a écrit :
>>> A recent PQM email had a scrambled UTF-8 subject line.  Looks like
>>> it's encoding in UTF-8 but marking it as Latin-1.
>>>
>>>> Subject:
>>>> =?iso-8859-1?q?Rev_1816=3A_=28erik_b=C3=A5gfors=29_remove_duplic?=
>>>>     =?iso-8859-1?q? 
>>>> ate_lines_in_help_and_NEWS_=28bug_=2350561=29_in_/h?=
>>>>     =?iso-8859-1?q?ome/pqm/archives/thelove/bzr/+trunk/?=
>>>
>>>
>>> (Yes, that's insanely ugly, but that is actually the right way to do
>>> it.  The name of the encoding is the bit that's wrong.)
>>
>> It looks like a transfer encoding (quoted printable) that is not  
>> decoded.
>>
>
> Well, the submission was in utf-8, and bzr seems to accept it in  
> utf-8,
> since it is recorded correctly in the history.
> I had to fix my 'pqm-submit' plugin to actually do foo.encode('utf8'),
> but I only found out because ASCII doesn't have that character.
> I don't know how the PQM is sending out emails, but it may not realize
> it is dealing with utf-8.
> I didn't do anything special (like set up a special header to indicate
> the encoding was utf-8). I suppose I should, but I'm not that well
> versed in the exact meaning of all SMTP headers.
>
> Is it sufficient to just add 'Content-Type: text/plain;  
> charset=UTF-8'?
>
> That is what my mailer does, and it seems to happen before the  
> Subject line.
>
> It seems like we need more than that, since my test email came back  
> with:
> Test =?UTF-8?B?2KzZiNis2Yg=?=
>
> And the Content-Type line actually happened after the Subject this  
> time.

It's kind of outside the scope of this mailing list, but yeah, the  
'Content-Type' refers to the body, not the headers.  Headers which  
wish to use a non-ASCII encoding have to use the "=?" mess.

What I meant above is that the PQM script is correctly adding the  
"=?" header.  That part all looks right.  It's just using the wrong  
encoding name -- it should be using UTF-8, not ISO-8859-1, because  
the subject was actually encoded in UTF-8.

That's why Erik's name showed up with two garbage characters in the  
subject line, where the "a with circle" would normally be.

robey





More information about the bazaar mailing list