[patch] improved ignore pattern matching (#57637)
Jan Hudec
bulb at ucw.cz
Sun Nov 26 15:57:38 GMT 2006
On Sun, Nov 26, 2006 at 08:35:36AM -0600, John Arbash Meinel wrote:
> [...]
> My personal preference would be to support the following syntax:
>
> ? => Match any single character except '/' ie '[^/]'
> * => Match 0 or more characters except '/' ie '[^/]*'
> ** => Match 0 or more characters ie '.*'
> RE: => A prefix which lets you get as crazy as you want, except for
> group matches as mentioned by Jan.
>
> This isn't 100% compatible with shell, but shell is trying to do
> *inclusion*, while we are trying to do *exclusion*.
>
> So when do you 'ls *', you typically don't want to see dot files. But if
> you did "bzr ignore '*'" you typically *would* want to ignore dot files.
>
> It is slightly different than other syntaxes, but having to do:
>
> bzr ignore 'foo/*' 'foo/.*'
>
> So that you could make sure that all files in 'foo' default to being
> ignored is ugly.
>
> I realize some people would prefer:
>
> bzr ignore foo/**/*.txt
>
> versus
>
> bzr ignore foo/**.txt
>
> But both forms work in my proposed translations, and I think it is
> mentally simpler to understand than the complexities of zsh globbing.
Actually the former does not. Because the two slashes in '/**/' will not
match a single '/' (while it does in zsh style, because there it is
actually '**/', that can match empty string). For extensions (the more
common case), the later can be used instead. But if you wanted to ignore
eg. CVS dirs, under your proposal you have to say:
foo/CVS
foo/**/CVS
while under zsh style it is just:
foo/**/CVS
The reason why zsh globs are done the way they are is, that when used in
filename generation, they are actually matched by component. And than **
as a whole component is quite a bit easier to do.
On a side-note, I just checked and zsh really requires splitting the
pattern on / to yield correct patterns:
bulb at efreet:~$ ls -d (a/b)#.txt
zsh: bad pattern: (a/b)#.txt
(the # is a kleene star)
> My proposed changes are:
>
> 1) Forward compatible, except for '*' not matching '/', but it has
> already been decided that that was a bug, so we don't have to keep
> compatibility there.
>
> 2) Simple for users to understand. And I think this is key, if you
> really need complexity in your ignores, you can use RE: and go to town.
> But 99.9% of the time ignore patterns are simple. They can even be
> overly generic because if someone explicitly 'bzr add's a file, it
> becomes versioned.
>
> 3) Handles all the common cases.
>
> I think the proposed patch gets very close to this. And I'm sorry I
> haven't spent more time looking over it. Last week was Thanksgiving here
> in the US, and the week before was a conference. I'll try and look
> closer on Monday.
Well, I fear that users will tend to miss that 'foo/**/bar' does not
match 'foo/bar' (and I think it is not SO uncommon to not want
'foo/**bar instead). This can be fixed either by making the pattern
'**/' (like zsh does), that would expand to '(|.+/)', or to recognize
both '**/' and '**', translating to '(|.+/)' and '.*' respectively.
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb at ucw.cz>
More information about the bazaar
mailing list