Bazaar-NG traffic #2

John Arbash Meinel john at arbash-meinel.com
Wed Oct 12 23:56:03 BST 2005


David Allouche wrote:
> On Wed, 2005-10-12 at 13:17 -0700, Robey Pointer wrote:
> 
>>I think if a filename comes back as a string instead of unicode, it's  
>>because python couldn't decode it using the filesystem's encoding.   
>>(AFAIK this is mostly a unix problem.*)  In that case if you just  
>>pretend the filename is in Latin-1, you will preserve the gibberish  
>>filename: Latin-1 defines a unicode char for every possible byte  
>>0-255, so it's non-lossy.  The gibberish filename can be  
>>reconstituted as the same gibberish on the other end.
> 
> 
> It's lossy.
> 
> Because by decoding as latin-1, then encoding to utf-8, lose the
> information that "this file name is a byte stream, not a unicode
> string".
> 
> In other words, you do not know which names would need to be "fixed",
> the computer will no longer be able to make a difference between the
> gibberish names and the meaningful ones.
> 
> If you want to preserve the gibberishness, you need to attach a metadata
> bit to all file names.

So is there a real need to control files with non-unicode names?

Certainly we could save the name in a different format, some sort of
base64 encoding, with some sort of opening character which would
indicate that this filename should be treated as a byte stream.

But is it worth the effort? I certainly think it should be something
that should be handled when the time comes, rather than having to have
it right now.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051012/d42be053/attachment.pgp 


More information about the bazaar mailing list