[MERGE] Unit tests for win32 glob expansion

John Arbash Meinel john at arbash-meinel.com
Tue Jul 17 16:54:11 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> Kuno Meyer ?8H5B:
>> On 16.07.2007 15:08, Alexander Belchenko wrote:
>>> Looks good. Just one question.
>>>
>>> Looking at test data:
>>>
>>> [['*/'], ['c\\', 'd\\']],
>>>
>>> I wonder, should we do normalization from '\\' to '/' as in osutils.py
>>> codebase?
>>>
>>> [µ]
>> I'm not familiar enough with the Bazaar code base to judge whether this
>> normalization would have a benefit, or would just be redundant, or even
>> counterproductive. I had the impression that internally bazaar converts
>> everything to '/'.
> 
> I'd like to hear John's opinion here.
> AFAIK, '/' normalization is mostly needed for proper cross-platform testing.

I think the biggest reason is because we were using the paths-in-memory
when we wrote things out to disk, and you don't want to have mixed
"path/to/foo" and "path\to\foo" written in your inventory files.

At the moment, all of the osutils.py functions re-normalize the path
(every time they are used, which is a bit much). So when you call:
"osutils.abspath()" or "osutils.realpath()" or "osutils.pathjoin()" it
forces all '\' => '/'.

There is also some need in testing, so that you can get consistent paths
in your list.

It would be nice to have it done at a very obvious layer, and then we
don't have to worry about it within the rest of the codebase. It also
would allow us to version paths with '\' in their name, on platforms
that support it. One complaint about bzr is that we are a bit stricter
than the OS about what filenames we will allow, so people can [and
unfortunately *do*] create files that cannot be versioned. If only in
legacy systems (when converting from CVS, etc). Early on in the project
we decided to only version things that could be represented as Unicode.
Especially because it means I can checkout something on Win32 and get a
very good representation of what you checked in on Linux. (Using utf-8
would break windows which uses OEM encoding for 8-bit filenames, or a
UTF-16 interface).


> 
>> However, please take into account that win32utils.glob_expand() just
>> gets invoked for the "add" command, and all other commands have to cope
>> with direct user input. I'm wondering why the implementation is this
>> way, as there are a lot of shortcomings: The command "bzr rm *.txt" for
>> example returns with the error message "*.txt does not exist", even
>> after "bzr add some.txt".
> 
>> My intention was to move the the win32 glob expansion down to the
>> beginning of run_bzr as soon as this patch was accepted. Then we would
>> have a quite similar behaviour as under Linux for /all/ commands. If it
>> makes sense to perform path normalisation there as well (either to '/'
>> or to '\\'), then we should implement it. The only issue I see with that
>> solution is, that the behaviour of "bzr ignore *.pyc" potentially would
>> not be understood by Windows users.
> 
> I have very strong objections against using glob_expand for *all* commands,
> because you have no way to disable glob on win32.
> AFAIK, quoting (in single quotes) on Linux prevents for glob expanding,
> but on win32 you simply can't do this.

Double quotes also disables globbing, but still allows variables $foo.

On Windows the single quote character *doesn't* cause quoting to happen
(bzr commit -m 'foo bar baz' will commit the files "bar" and "baz'" with
a message "'foo").

> 
>> Let me know what you think, and whether I should continue without
>> waiting for the current patch to be applied.
> 
> John, can you also look on this patch, please?
> 

I would actually prefer it if the glob matcher returned paths with '/'
rather than '\\'. So tests like:

[['*/'], ['c/', 'd/']]
[['*\\'], ['c/', 'd/']]

We need to handle user input of things like '*\\' but since the returned
paths are internal, I would like to see them use '/'. But honestly, I
would actually rather see them returned as:

[['*/'], ['c', 'd']]
[['*\\'], ['c', 'd']]

Not removing trailing slashes could confuse portions of the code that
just treat the paths as the names to version the.

At the very least it seems inconsistent that you have a test like:

+            [['d'], ['d']],
+            [['d/'], ['d/']],
+            [['d\\'], ['d\\']],

and then
+            [['*/'], ['c\\', 'd\\']],
+            [['*\\'], ['c\\', 'd\\']]])


(if x/ returns x/ shouldn't */ return x/ ?)

Overall, I think this is a good place to take paths given by the user,
and then make them nice paths for use internally.

The only other thing I would recommend testing is that you can go deeper
than 1 level of directories. (have a c/d/e, and figure out that */*
should match c/d but not c/d/e)

Oh, and you may want to test passing multiple levels of slashes. So if
someone types:

c//d

it still matches 'c/d' (or equivalently typing c\\d matches c/d).

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGnOYjJdeBCYSNAAMRAqnPAKCtU5dmCVkAF6PtEJvnm72f5YL6hACffFeJ
i6tdP6rN/+tsgFlb5c7BH6I=
=2dVn
-----END PGP SIGNATURE-----



More information about the bazaar mailing list