Unicode Normalization

John Arbash Meinel john at arbash-meinel.com
Thu Jun 29 14:55:23 BST 2006

Hash: SHA1

Robey Pointer wrote:
> On 26 Jun 2006, at 7:32, John Arbash Meinel wrote:
>> Right now, I think the best way to go would be to do something in
>> list_files, similar to how WorkingTree does it now for ignored files.
>> Basically, you go through, and if you know a file is versioned, you just
>> return it. If it doesn't match the inventory, you check if it needs to
>> be normalized. And if the name changes, you then check again if it is
>> versioned, and then go on to check if it is ignored, etc.
>> Does this seem reasonable? It adds an extra function call, and an if
>> statement to the list_files loop. Which I'm not super keen on (since it
>> affects initial 'add' performance).
>> But I think it has the least impact in the case that most of the files
>> are versioned, and most of them are not fancy unicode, while still
>> correctly handling filenames on all platforms.
> I'm basically +1 on this approach.
> Would this be a good way to handle case normalization too?  On Mac and
> Windows, "README" and "ReadMe" are the same file: case is preserved but
> not significant.  This has actually caused me a problem once or twice
> with files in other VCS.  It'd be nice if bzr went "I don't know about
> ReadMe but README is versioned and you're on a mac so they're the same
> file."
> robey

That is a little bit trickier, since you would have to fix case for both
the inventory and for the filesystem.
But something like that should be possible.

I would like us to stay 'case-preserving' on case-insensitive
filesystems. I'm intentionally not being 'unicode-normalization-preserving'.

I think there might be something we can do. I'll probably work on it a
little this week, and if I get somewhere, we're hoping for it to make it
into 0.9.


Version: GnuPG v1.4.1 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the bazaar mailing list