[win32] non-ascii/non-english file names: internal usage of file names
David Allouche
david at allouche.net
Thu Dec 1 15:34:29 GMT 2005
On Wed, 2005-11-30 at 10:23 -0500, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> David Allouche wrote:
> >
> > u'/Utilisateurs/Édouard/'.encode('latin-1') +
> > u'docs/thèse.tex'.encode('utf-8')
> >
> > Some systems consider file names as character strings (Windows?) others
> > consider file names as byte stream. You probably cannot get correct and
> > reliable behaviour for both if you do not acknowledge the discrepancy.
>
> We can require that all files in a version-controlled directory have
> unicode-meaningful names. I think that there are very few situations
> involving source code where totally arbitrary filenames are an advantage.
I agree completely.
But system names (e.g. absolute file name) and version controlled names
(always relative to the tree root) are different things, and I do not
think it's reasonable to make that assumption for system names.
In that example, I was imagining that the locale was UTF8 and that the
version-controlled name was './thèse.tex'.
The user story for this setup would go along this line: Ed is a PhD
student working on the university computers. The university sysadmin has
an experience-proven system using a latin-1 based locale and has no
desire to change it. But Ed knows enough to burn himself and decided he
wanted to use unicode so he put a "LANG=fr_FR.UTF8" statement in his
~/.bash_profile.
> And if people scream, we can go to a more complex approach of requiring
> versioned files to be unicode, but not unversioned files in the tree.
>
> And if people scream, we can find ways to jam binary data into unicode,
> in one of the user-defined sections.
>
> But I'd rather start simple.
I'm all for starting simple, and I did say I thought it was reasonable
to require that version controlled name be meaningfully decodable to
unicode.
But I'm not for dropping the meaningfulness requirement by trying force
feed names into the system by assuming that anything weird has to be
latin-1, as Jan suggested.
--
-- ddaa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051201/86190773/attachment.pgp
More information about the bazaar
mailing list