come play with dirstate!

John Arbash Meinel john at
Tue Feb 13 14:22:49 GMT 2007

Hash: SHA1

Robert Collins wrote:

> A related project that would be great to do for 0.15, would be to use
> utf8 everywhere we currently use unicode strings within bzrlib, except
> for:

I have a patch which I'm submitting now, which changes revision_id to be
utf-8 everywhere internally. (At least that I can find).

file_ids would be the next step. I'm not quite as sure about file paths.
Because we can easily get invalid utf-8 paths on Linux if we never
decode them. Some might consider this a feature, as then you can version
any path, but I think we stated long ago that we don't really want to
version paths that are not unicode compatible.

And while we could just support any character string, I think it breaks
on other platforms. Windows will get really screwy because it will be in
whatever OEM encoding is active right now (so lots of character sets
will get really messed up, I think.) And Mac OS X will start renaming
files for us. We still need to work out how we want to handle that one,
since it has actually started cropping up. We could just auto-normalize
on all platforms, but it means calling an expensive function for every
path on disk. Maybe we could do it when we get an inventory miss.


Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla -


More information about the bazaar mailing list