glob-semantics on win32: windows or unix semantics?

Martin Pool mbp at sourcefrog.net
Fri Jul 13 03:31:10 BST 2007


On 7/13/07, Kuno Meyer <kuno.meyer at gmx.ch> wrote:
> On 12.07.2007 08:43, Martin Pool wrote:
> > On 7/12/07, Kuno Meyer <kuno.meyer at gmx.ch> wrote:
> >
> >> > There do not seem to be any explicit tests already.  I think it would
> >> > be good to add two different tests.
> >> >
> >> > 1- When glob_expand_for_win32 is called, it has the right effect:
> >> > expanding things that are globs and can match something.  The easiest
> >> > thing is probably to make it a TestCaseInTempDir and then use
> >> > build_tree to make some files to test against.
> >> >
> >> > 2- When we do a blackbox test on bzr add, this method does get
> >> > correctly invoked.  This is a bit tricky as it's only supposed to be
> >> > active on Windows.
> >> > I can see a couple of options:
> >> >
> >> > 2a - use run_bzr_subprocess, which will give the shell a chance to do
> >> > the expansion on Unix (i think).
> >> >
> >> > 2b - change run_bzr and rearrange the layering so that if you try to
> >> > run bzr in-process with wildcards it does the expansion through this
> >> > method, even on unix.
> >>
> >> Ok. I will provide some tests at least for case 1-, but in a separate
> >> patch. Thank you for a hint how to implement it.
> >
> > Thanks, that would be great.
> >
>
> <skip>
>
> After writing some initial tests, I think the current implementation of
> win32utils.glob_expand trying to imitate the Unix shell expansion is not
> correct.
>
> - case-insensitivity
>    (win32: pattern 'a' is expected to match with filename 'A')
>
>
> In the cases of the '.' (extension separator) and the '?' wildcard I am
> not sure whether implementing the Windows semantics is the right thing,
> because this behaviour is very strange and users might be surprised:
>
> - '?' matches with 'zero or one char, but not "."'
>    (win32: pattern 'a?' matches with 'a', 'a1')
>    (unix: pattern 'a?' matches just with 'a1')
>
>    (win32: pattern 'a??' matches with 'a', 'a1', 'a11', but not 'a.1')
>    (unix: pattern 'a??' matches with 'a11' *and* 'a.1')
>
> - '*.*' is an equivalent to '*'
>    (win32: pattern '*.*' matches with 'a')
>    (unix: no match)
>
> - '*.' matches with anything without extension
>    (win32: pattern '*.' matches with 'a',
>            as 'a' and 'a.' are identical names)
>
>
> In my opinion,

> 1) we have to care about the case-insensitivity (try "touch a; bzr add
> A; bzr st" on Windows, and you see what I mean).

Yes, and this is broader than just wildcard expansion -- to properly
fix it probably means that

> 2) Patterns ending with "." seem to be treated correctly even in the
> current code base.
> 3) For all other cases, implementing the Unix semantics seems to be the
> better solution.
>
> What is your opinion?

I think we should first of all just merge some tests that the current
code behaves as we expect in the 'uncontroversial' cases.  We should
also look into making sure  that run_bzr() can test wildcard handling
correctly.

I'm slightly surprised that a trailing dot has the windows behaviour,
but that's not such an important case.  I think keeping the unix
behaviour in general is probably pretty reasonable -- arguably a
feature for people using it across platforms and certainly easier to
test.

I'd like us to add a developer document explaining how to improve
case-insensitivity.  I don't think wildcards are the most important
case (but I might be wrong.)  I think we probably need to look at
every interface that goes to the local disk (workingtree, plus others
like glob expansion), and consider what will happen if the system is
case insensitive.

-- 
Martin



More information about the bazaar mailing list