[BUG] Unicode string must be always used with encodings

John A Meinel john at arbash-meinel.com
Mon Sep 26 20:11:23 BST 2005


Alexander Belchenko wrote:
> I suppose that:
> 
> * for control files it must be used utf-8 always,
> * for input from user it must be used input_encoding (sys.stdin.encoding 
> or user_encoding),
> * for output to user it must be used output_encoding 
> (sys.stdout.encoding or user_encoding)
> * for decode filenames to unicode strings it must be used user_encoding

I'm not sure about this last one. For instance, most Linux systems use 
utf-8 as the encoding. And Windows uses UTF-16 (of which python doesn't 
seem able to read).

I'm not sure about some characters, but I know that I'm not able to read 
arabic filenames in python (native or cygwin). Now, arabic is extra 
crazy because of right-to-left vs left-to-right, so there might be 
better support for other languages. (But IDLE can print arabic 
characters correctly, and still os.listdir() shows the files as "??????")


> 
> If my assumption is right, then I will be able to start implement these.
> 
> Also I think it will be good to have global options for pointing to bzr 
> default encoding for input/output, like docutils do:
> 
> --input-encoding=XXX or -iXXX          -- for input encoding
> --output-encoding=XXX or -oXXX         -- for output encoding
> 
> Is this global options free in bzr design?

I think these would be available, just parse them at the same time as 
"--profile".

I think we also need to go through and pay close attention to when we 
use os.sep and when we use "/".

I would say that internally all paths should be "/" separated, and that 
is how they should be referenced in any internal files. Though I believe 
<inventory> doesn't care, since it doesn't write directory lists, it 
just keeps a reference to the parent. And I'm not sure where else would 
store full paths. (.bzr/parent could store either path, since it 
shouldn't be copied out of the branch)

I know I prefer to give bzr commands using forward slashes, so I don't 
think we can assume that all user input comes in with os.sep.

John
=:->

> 
> Alexander.
> 
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050926/4f3be768/attachment.pgp 


More information about the bazaar mailing list