[BUG] bzr changeset generation fails with non-ascii characters

Robey Pointer robey at lag.net
Sun Jul 17 19:18:32 BST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 16 Jul 2005, at 10:42, John A Meinel wrote:

> Aaron Bentley wrote:
>
>> John A Meinel wrote:
>>
>>
>>>> Robey Pointer wrote:
>>>>
>>
>>
>>
>>>>> FWIW, I agree that the cset should be treated as being in no  
>>>>> encoding
>>>>> (using whatever encoding is used for each file), and that means  
>>>>> being
>>>>> 8-bit clean with no codec.
>>>>>
>>
>>
>> I think this is the best option, but it may not be a great one for  
>> files
>> in 16-bit encodings.  The resulting patch would be hard to read,  
>> since
>> it would mix ASCII with, say, UCS-2.  AIUI, diff will just say  
>> 'binary
>> files differ', because 16-bit files treated as 8-bit files have NULs
>> everywhere.  I don't know what difflib does with binaries.
>>
>> Just thought that point was worth mentioning.
>>
>>
>
> Yeah, well, the changeset format isn't really designed to handle  
> binary
> files. I'm not sure what we should do about it.
>
> But also, we have the problem that some portions of the file might  
> be in
> UNIX line endings, and portions in DOS/MAC line endings. Again because
> diff replicates the exact line endings, and the changeset generation
> will be done in unix line endings.

I just tested and verified that "diff" (on a Mac!) fails to cope with  
old-style Mac line endings of '\r'.  (It decided my file had exactly  
one line with no terminating linefeed.)  For old DOS line endings ('\r 
\n') it treats the line as ending at the '\n' and the '\r' as just a  
little extra character at the end of each line.  So I think it's fair  
game to do the same in our patches: lines end in '\n'.

Notably, diff metadata lines ("@@ -1,2 +1,3 @@") end in '\n' no  
matter what line endings are used by the original files.

The only sane thing to do (IMHO) is to treat lines as always  
terminated by '\n' but possibly containing binary data (UTF8 bytes,  
'\r' droppings, etc).

robey

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)

iD8DBQFC2qD+QQDkKvyJ6cMRAi28AJ9k8e1lJh+g82KEtF2DIxzwN88+ewCfS4M9
RRn2MSQT+nCcRSq1yd4Z314=
=p3yT
-----END PGP SIGNATURE-----




More information about the bazaar mailing list