Unicode Normalization

Thu Jun 29 07:31:10 BST 2006

On 26 Jun 2006, at 7:32, John Arbash Meinel wrote:

> Right now, I think the best way to go would be to do something in
> list_files, similar to how WorkingTree does it now for ignored files.
>
> Basically, you go through, and if you know a file is versioned, you  
> just
> return it. If it doesn't match the inventory, you check if it needs to
> be normalized. And if the name changes, you then check again if it is
> versioned, and then go on to check if it is ignored, etc.
>
> Does this seem reasonable? It adds an extra function call, and an if
> statement to the list_files loop. Which I'm not super keen on  
> (since it
> affects initial 'add' performance).
> But I think it has the least impact in the case that most of the files
> are versioned, and most of them are not fancy unicode, while still
> correctly handling filenames on all platforms.

I'm basically +1 on this approach.

Would this be a good way to handle case normalization too?  On Mac  
and Windows, "README" and "ReadMe" are the same file: case is  
preserved but not significant.  This has actually caused me a problem  
once or twice with files in other VCS.  It'd be nice if bzr went "I  
don't know about ReadMe but README is versioned and you're on a mac  
so they're the same file."

robey