come play with dirstate!

Robert Collins robertc at robertcollins.net
Tue Feb 13 22:09:28 GMT 2007


On Tue, 2007-02-13 at 08:22 -0600, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> ...
> 
> > A related project that would be great to do for 0.15, would be to use
> > utf8 everywhere we currently use unicode strings within bzrlib, except
> > for:
> 
> I have a patch which I'm submitting now, which changes revision_id to be
> utf-8 everywhere internally. (At least that I can find).
> 
> file_ids would be the next step. I'm not quite as sure about file paths.
> Because we can easily get invalid utf-8 paths on Linux if we never
> decode them. Some might consider this a feature, as then you can version
> any path, but I think we stated long ago that we don't really want to
> version paths that are not unicode compatible.

We can explicitly decode and check during add and perhaps e.g. commit.
But having added them, why should we decode them all the time?

> And while we could just support any character string, I think it breaks
> on other platforms. Windows will get really screwy because it will be in
> whatever OEM encoding is active right now (so lots of character sets
> will get really messed up, I think.) And Mac OS X will start renaming
> files for us. We still need to work out how we want to handle that one,
> since it has actually started cropping up. We could just auto-normalize
> on all platforms, but it means calling an expensive function for every
> path on disk. Maybe we could do it when we get an inventory miss.

I think that paths are inherently unicode and we should treat them as
such. However, we should not store nor process the stored for using
python unicode strings, and this should be purely a performance
consideration, not a semantic change.

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070214/4a98dca3/attachment-0001.pgp 


More information about the bazaar mailing list