[merge] win32 cleanups
John Arbash Meinel
john at arbash-meinel.com
Sun Jun 11 05:08:20 BST 2006
I took some time today to do some housekeeping with my win32 branch. The
attached patch shows what I've gotten to so far. It doesn't make all
tests pass on win32, but it does make a whole bunch more pass than did
before.
To hit the highlights:
1) bzrlib.osutils._posix_abspath was using os.path.abspath rather than
posixpath.abspath. I went ahead and cleaned up the code to copy
posixpath.abspath only using os.getcwdu() instead of os.getcwd()
2) Windows has a completely separate api for unicode strings versus
plain strings. As such, paths are not necessarily reachable by the
plain-string apis. This is very different from Linux, where the
plain-strings are bytestreams which happen to be utf8 encoded. So you
can access all paths either way. (for example os.listdir('.') will still
return valid paths under Linux, on Win32 they will return '???' style
filenames).
This means that our handling of paths may be completely inverted between
Linux and Win32. Robert mentioned that it may be better to access things
as bytestreams, rather than assuming they can be cast into Unicode
paths. And there are some good reasons for that (especially when running
the test suite with LANG=C).
3) SimpleHTTPServer grabs files as bytestream paths (ie it doesn't
interpret the path as any sort of encoding). This means that all of our
tests which create files using utf-8 succeed in getting them as utf-8
paths under Linux. But it fails under Windows, so I wrote an override
for the default translate_path code, so that it would require utf-8
paths. This would probably break running in LANG=C mode, so for now, I
only use it if sys.platform == 'win32'
4) Win32 has a separate code page for the preferred encoding versus the
actual encoding of the terminal. I don't know what crack that is, but
that is why we had the 'use sys.stdout.encoding if available'. Anyway,
it means I added a function to osutils for 'get_terminal_encoding',
which just does what we used to do, but now as an independent function
which is unit-tested, and can be used elsewhere.
5) Such as by the blackbox/test_non_ascii.py tests, where we need to
skip the utf8 tests on Windows, since stdout cannot print out the paths
that can be created on the filesystem. I'm happy to say that all
non_ascii tests pass on my latin-1 machine (locale cp1252, console
cp437). I'm guessing this will also help Alexander. I also tested some
extra characters that aren't latin-1 (omega & epsilon). But I decided
that since the test suite was already taking forever, latin-1 was good
enough for now (I don't use any latin-1 characters that can't be encoded
in cp437).
6) SFTP tests have not been addressed yet. For now, I'm commenting out
paramiko when I run the full test suite. They need to be addressed, but
right now they fail, and take 30s to notice each time. Which takes an
*awfully* long time when you have 100+ tests.
7) For some reason, the test suite runs *way* slower in native win32
python. Like the 'test_added' in test_non_ascii takes 6+ seconds rather
than < 1s. Probably it would be worthwhile to do some win32 profiling
just to see if there is anything obviously horrible that we are doing.
Probably doing ".replace('\\', '/')" is happening far to often, and is
just being used for absolute consistency, rather than just sanitizing
the inputs at the appropriate time.
Anyway, I wanted to let people know these fixes were out there, and
maybe get a review/+1. Even though not all tests are passing, more tests
pass now, and it shouldn't break Linux stuff.
John
=:->
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: win32-encoding-1.diff
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20060610/e9925035/attachment.diff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060610/e9925035/attachment.pgp
More information about the bazaar
mailing list