Bazaar-NG traffic #2

Jan Hudec bulb at ucw.cz
Wed Oct 12 08:44:51 BST 2005


On Wed, Oct 12, 2005 at 09:39:43 +1000, Martin Pool wrote:
> On 12/10/05, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> 
> > I think we need to insist that any versioned files have unicode
> > filenames.  Considering that a bytestring filename for one user may be a
> > unicode filename for another, I think we could have really ugly
> > corner-case behaviour if we tried to support bytestring filenames.
> 
> I agree.  It seems far more likely that people will want to have
> filenames that are meaingful in some language than filenames that
> cannot be printed in any language at all.  If they just have filenames
> that are inconsistent with the standard encoding on their system then
> that is a different and potentially soluble problem.

Well, it's pretty hard though. Any conversion you do on
non-representible name will be non-reversible. So you need to store a
demangle table.

Python is nice in that it can install handlers for this mangling into
unicode.encode and str.decode respectively. I'd suggest using
'xmlcharrefreplace' (newer windows should handle that), or, if it turns
to not work somewhere, create a handler that would use %xxxxxx% (6 hex
digits) or something like that.

The name should be decoded and the mapping between repository name and
local name recorded in some file. Though stat-cache seems like a good
place, I am not so sure as it would no longer be just a cache.

> We should add some tests for versioning files with non-ascii names.
> 
> Joel's page points out sys.getfilesystemencoding().

And it even seems to return the right thing...

--
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051012/48c8eda3/attachment.pgp 


More information about the bazaar mailing list