[win32] non-ascii/non-english file names: internal usage of file names

Wed Nov 30 08:52:30 GMT 2005

On Wed, Nov 30, 2005 at 09:07:53 +0100, Jan Hudec wrote:
> On Wed, Nov 30, 2005 at 10:46:18 +1100, Andrew Bennetts wrote:
> > On Wed, Nov 30, 2005 at 01:20:51AM +0200, Alexander Belchenko wrote:
> > [...]
> > > >Or if it is critical that the output is correct, then it will use
> > > >encode(txt, 'error'), which will throw an exception if a character
> > > >cannot be encoded properly.
> > > 
> > > But for purpose of editing commit message and show to user status of 
> > > tree this approach is bad: nor 'replace' nor 'error' will not desired.
> > 
> > If some text you want to display cannot be encoded in the console's encoding,
> > you have no choice.  Probably an uncommon situation, but definitely possible.
> > 
> > > Native python implementation of StringIO (not cStringIO) accept 
> > > ascii-strings or unicode-strings. So, we can use latter form.
> > 
> > I don't think it's wise to rely on differences between StringIO vs. cStringIO --
> > they're probably accidental, and likely to change in future versions of Python.
> > 
> > Treat files, including [c]StringIO instances, as byte-streams, and explicitly
> > encode unicode when writing to them.  The codecs.EncodedFile wrapper makes this
> > easy.
> 
> Hm, that sounds to be the right way to do all the IO (tcl (for
> ages) and perl (since 5.8) have this capability built into every
> stream and changeable on the fly). Unfortunately the documentation does
> not say, what codecs.EncodedFile does on reading. IMHO correct behaviour
> would be to always return unicode, decoding as necessary.

Sorry for replying to myself. Well, it really seems codecs.EncodedFile
does not do what it should. I'd rather suggest composing a
codecs.StreamReaderWriter object roughly like:

class UnicodeStream(codecs.StreamReaderWriter):
    def __init__(self, stream):
        encoding = stream.encoding
        if encoding is None:
            encoding = sys.getdefaultencoding()
        codec = eval 'encoding.%s' normalize_encoding(encoding)
        super(UnicodeStream, self).__init__(stream, codec.StreamReader, codec.StreamWriter)

(with some working out of how to load the codec module, that I don't want to
think about right now).
Than there would be no string objects, only unicode objects, which would help
overall sanity.

--
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051130/0a7daaff/attachment.pgp