[win32] non-ascii/non-english file names: internal usage of file names

John Arbash Meinel john at arbash-meinel.com
Tue Nov 29 22:41:06 GMT 2005


Alexander Belchenko wrote:
> John Arbash Meinel пишет:
> 
>> As far as the "StringIO" can't decode into ASCII, that is something
>> we've discussed.
>> Basically, there are commands which "must be correct" and commands which
>> "shouldn't fail". 'bzr commit' shouldn't fail because it can't display
>> the log correctly (hence it should use encode(foo, 'replace')), other
>> commands must not succeed with bogus output (possibly bzr diff, anything
>> that is writing into a control file, etc.), and those should use the
>> default encode(foo, 'error')). We just need to do more explicit encoding
>> calls.
> 
> 
> I don't understand your thought, sorry.
> 
> I'm talk about following code in commit when no message and no file is
> given and need to launch external editor for entering commit message:
> 
>         if message is None and not file:
>             catcher = StringIO()
>             show_status(tree.branch, specific_files=selected_list,
>                         to_file=catcher)
>             message = edit_commit_message(catcher.getvalue())
> 
> 
> In this code show_status unable to print non-ascii filenames to
> file-like StringIO object due to limitation of StringIO.

The specific problem is that a StringIO has encoding of "ascii"
(actually it has None, but that implies ascii). So there are a lot of
things that won't encode to ascii.
What we need to do is change the code that writes, so that it does its
own encoding, and if the command isn't critical, it will use encode(txt,
'replace') which will put dummy characters if it can't encode something.
Or if it is critical that the output is correct, then it will use
encode(txt, 'error'), which will throw an exception if a character
cannot be encoded properly.

> 
> Later this catched output used as string for passing as argument to
> edit_commit_message(). Probably here we could use real temporary file
> instead of file-like StringIO() to avoid encoding problems. But later in
> edit_commit_message() again created temporary file for editing commit
> message and to latter file will prints (again!) status output. I think
> there is some sort of overhead. Overhead not in execution time but in
> amount of executed actions with similar effect.
> 
> Furthermore, lately I send patch (#27) that fix some encodings issues in
> commit and log commands. And I give the example when system encoding and
> console encoding may vary on windows machine (due to backward
> compatibility of windows). That patch need to be taking into
> consideration when above code chunk will be refactored: show_status
> should be encoded by default with bzrlib.user_encoding not with
> sys.stdout.encoding, I guess.

Well wouldn't the console encoding be the "correct" encoding for output
(sys.stdout), since you are trying to display something. While if you
are reading from a file, you might expect bzrlib.user_encoding.

Now for the commit message, you probably want to put it out with the
system encoding, because the user will edit it with a text editor, save
it, and then we read it back in.

John
=:->

> 
> Alexander
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051129/88668beb/attachment.pgp 


More information about the bazaar mailing list