Bazaar-NG traffic #2
John Arbash Meinel
john at arbash-meinel.com
Wed Oct 12 23:56:03 BST 2005
David Allouche wrote:
> On Wed, 2005-10-12 at 13:17 -0700, Robey Pointer wrote:
>
>>I think if a filename comes back as a string instead of unicode, it's
>>because python couldn't decode it using the filesystem's encoding.
>>(AFAIK this is mostly a unix problem.*) In that case if you just
>>pretend the filename is in Latin-1, you will preserve the gibberish
>>filename: Latin-1 defines a unicode char for every possible byte
>>0-255, so it's non-lossy. The gibberish filename can be
>>reconstituted as the same gibberish on the other end.
>
>
> It's lossy.
>
> Because by decoding as latin-1, then encoding to utf-8, lose the
> information that "this file name is a byte stream, not a unicode
> string".
>
> In other words, you do not know which names would need to be "fixed",
> the computer will no longer be able to make a difference between the
> gibberish names and the meaningful ones.
>
> If you want to preserve the gibberishness, you need to attach a metadata
> bit to all file names.
So is there a real need to control files with non-unicode names?
Certainly we could save the name in a different format, some sort of
base64 encoding, with some sort of opening character which would
indicate that this filename should be treated as a byte stream.
But is it worth the effort? I certainly think it should be something
that should be handled when the time comes, rather than having to have
it right now.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051012/d42be053/attachment.pgp
More information about the bazaar
mailing list