[merge] Mac- only allow unicode normalized filenames

John Arbash Meinel john at arbash-meinel.com
Sun Jul 2 07:40:11 BST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I took some time today to work through some of the unicode normalization
issues on Mac OS-X.
As a quick summary, Mac doesn't preserve filenames if they use non-ascii
names, instead choosing to always normalize them. Unfortunately, they
chose a bad normalization.
This patch modifies WorkingTree.list_files() and .extras() so that after
listing a directory, if a file isn't found, it then attempts to
normalize the filename, and try again.
It also modifies Inventory.make_entry() so that we won't create
non-normalized inventory entries.

I would rather do the checking at a higher level, but I felt that in a
first run at this, it was the safest level to do it at, because both
WorkingTree.add() and smart_add go through this layer.

The other reason to do it at a higher level is so that smart_add can
default to ignoring those paths if not explicitly asked to add them.

I haven't really investigated all of the performance impacts of these
changes yet. On trees with lots of unknown files, it probably has an
effect (since we have to try normalizing all of those names), though in
a stable tree it shouldn't have much effect, because either the files
are already known, or they are ignored.
I guess on Mac itself, if you have a lot of unicode filenames, it would
impact performance, because they will look like they are unknown, until
they are normalized.

At present, because we would only support translating to match on Mac,
we could make WorkingTree.list_files() not do a double lookup on other
platforms. Though ultimately I was thinking to have a different return
code for invalidly normalized files. (rather than I-ignored and ?-unknown)

But add should always perform the check, to make sure we don't add a
file with the wrong normalization.

There are still a few (7) tests that are failing, but mostly because
they are expecting a specific result from 'os.listdir()', which changes
on Mac.

I'm attaching a diff rather than a bundle, because it is much smaller.

I'll probably spend a little more time to get the last tests to pass,
but not tonight. :)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEp2pLJdeBCYSNAAMRApdSAJ4wAclRf7Ms7m9nZOHmSzbyzk1+EgCfZLFO
iK6bpQBV3EbvVpuewvphArw=
=tMry
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mac-normalized-filenames.diff
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20060702/655012ba/attachment.diff 


More information about the bazaar mailing list