default fsenc patch
a.badger at gmail.com
Tue Jan 24 00:37:58 UTC 2012
On Tue, Jan 24, 2012 at 09:52:24AM +1100, Martin Pool wrote:
> bzr already has the approach of treating filenames as real unicode: it
> decodes them when reading them from disk, according to whatever the
> fsenc is.
> The issue I'm talking about here is just what to do when the user has
> no fsenc specified by their locale. Python/glibc effectives assumes
> this means "ascii" and so it errors out when there are non ascii
> strings. We discussed whether it would be better to assume utf-8
<nod> I was afraid of that when I read "fsenc". ;-)
I think throwing errors is probably the best course here.
When importing data, it helps a little bit as it filters out some invalid,
garbled data. (Note that it's impossible to do this perfectly, though, as
the user could create filenames that are valid in their encoding and are
a valid map to characters in utf-8.)
When writing out data it gets more tricky. Defaulting to utf-8 is a valid
methodology that will allow you to checkout a repo even if the user is
implicitly using another encoding. The user will likely see garbled
filenames with some of their tools but will be able to operate on the
contents of the files. OTOH, the user will not be confronted with the fact
that the filenames have specific requirements (valid utf-8) so they may
create filenames in another encoding leading to importing garbled data or
they may set their locale later to something that has a non-utf8 encoding
and then when they go to commit and push, things will stop working.
You'll also still have to maintain the code for the same type of error
conditions since users may have their locale set to something that can't
doesn't support the unicode characters that you're attempting to use. So
the code will be (slightly) larger if you default to utf-8.
Saving filenames as abstract unicode is fraught with these types of
issues... throwing errors may be the best of a bad set of choices.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: not available
More information about the bazaar