[BUG] Unicode string must be always used with encodings
John A Meinel
john at arbash-meinel.com
Mon Sep 26 20:11:23 BST 2005
Alexander Belchenko wrote:
> I suppose that:
>
> * for control files it must be used utf-8 always,
> * for input from user it must be used input_encoding (sys.stdin.encoding
> or user_encoding),
> * for output to user it must be used output_encoding
> (sys.stdout.encoding or user_encoding)
> * for decode filenames to unicode strings it must be used user_encoding
I'm not sure about this last one. For instance, most Linux systems use
utf-8 as the encoding. And Windows uses UTF-16 (of which python doesn't
seem able to read).
I'm not sure about some characters, but I know that I'm not able to read
arabic filenames in python (native or cygwin). Now, arabic is extra
crazy because of right-to-left vs left-to-right, so there might be
better support for other languages. (But IDLE can print arabic
characters correctly, and still os.listdir() shows the files as "??????")
>
> If my assumption is right, then I will be able to start implement these.
>
> Also I think it will be good to have global options for pointing to bzr
> default encoding for input/output, like docutils do:
>
> --input-encoding=XXX or -iXXX -- for input encoding
> --output-encoding=XXX or -oXXX -- for output encoding
>
> Is this global options free in bzr design?
I think these would be available, just parse them at the same time as
"--profile".
I think we also need to go through and pay close attention to when we
use os.sep and when we use "/".
I would say that internally all paths should be "/" separated, and that
is how they should be referenced in any internal files. Though I believe
<inventory> doesn't care, since it doesn't write directory lists, it
just keeps a reference to the parent. And I'm not sure where else would
store full paths. (.bzr/parent could store either path, since it
shouldn't be copied out of the branch)
I know I prefer to give bzr commands using forward slashes, so I don't
think we can assume that all user input comes in with os.sep.
John
=:->
>
> Alexander.
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050926/4f3be768/attachment.pgp
More information about the bazaar
mailing list