[win32] non-ascii/non-english file names: internal usage of file names

Aaron Bentley aaron.bentley at utoronto.ca
Wed Nov 30 15:23:23 GMT 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David Allouche wrote:
> Except when it's not possible. I can trivially create a plausible
> filename in unix that cannot be decoded to unicode in any meaningful
> way.
> 
> For example:
> 
> u'/Utilisateurs/Édouard/'.encode('latin-1') +
> u'docs/thèse.tex'.encode('utf-8')
> 
> Some systems consider file names as character strings (Windows?) others
> consider file names as byte stream. You probably cannot get correct and
> reliable behaviour for both if you do not acknowledge the discrepancy.

We can require that all files in a version-controlled directory have
unicode-meaningful names.  I think that there are very few situations
involving source code where totally arbitrary filenames are an advantage.

And if people scream, we can go to a more complex approach of requiring
versioned files to be unicode, but not unversioned files in the tree.

And if people scream, we can find ways to jam binary data into unicode,
in one of the user-defined sections.

But I'd rather start simple.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDjcPr0F+nu1YWqI0RAvR9AJ9phe/mLQ2wxaLfGVNj91yEhXQAiACfRxvm
pIucqNCyDJowzmAPUGd18aY=
=SP5d
-----END PGP SIGNATURE-----




More information about the bazaar mailing list