Alternate glob matcher for .bzrignore

Martin Pool mbp at sourcefrog.net
Sun Jan 8 23:14:47 GMT 2006


On Sat, 2006-01-07 at 13:40 -0600, John Arbash Meinel wrote:
> John Arbash Meinel wrote:
> > Now with the actual attachments :)
> > 
> > John
> > =:->
> > 
> > 
> > John Arbash Meinel wrote:
> > 
> >>In my encodings branch, I found that fnmatch doesn't match unicode
> >>characters.
> >>
> >>So if you do:
> >>$ echo 'test' > Bågfors.txt
> >>$ bzr unknowns
> >>"Bågfors.txt"
> >>$ bzr ignore ./Bågfors.txt
> >>$ cat .bzrignore
> >>./Bågfors.txt
> >>$ bzr unknows #This is what fails
> >>"Bågfors.txt"
> >>
> >>We had discussed in the past changing the matcher so that it would
> >>create one big pattern, and then from that, we would check all paths one
> >>time, instead of checking each file many times. (This should help with
> >>paths with a large number of ignored files and patterns).
> >>
> >>I did some work to implement it. Basically creating a new translator
> >>from glob patterns into regular expressions. I also updated the fact so
> >>that "*" doesn't match directories. (It would be nice if we didn't have
> >>to worry about backslash being a directory separator.)
> >>
> >>Anyway, attached is the glob_matcher, and the test suite I wrote for it.
> >>They are present in my encoding branch.
> >>
> >>To replace our current "is_ignored" check, we would have to do:

That looks good.

> 
> In doing some more testing, ('**/' + pat) may not work, because it
> probably wants at least one directory separator to exist. In regular
> expression terms we want '(.*/)?'.
> I could write another globs_to_matcher() which would understand that if
> there is no '/' in the pattern, it needs to prepend the above to the
> regular expression.
> Or we could break the matching into 2 styles of patterns, one with, and
> one without paths. And then just check 2 regular expressions, one with
> just the trailing part of the path, and the other with the full path.

The zsh manpage says

       A  pathname component of the form ‘(foo/)#’ matches
       a path  consisting  of  zero  or  more  directories
       matching the pattern foo.

       As  a  shorthand,  ‘**/’  is equivalent to ‘(*/)#’;
       note that this therefore matches files in the  cur‐
       rent directory as well as subdirectories.  [...]

       This form does not follow sym‐
       bolic links; the alternative form ‘***/’ does,  but
       is  otherwise  identical.   Neither of these can be
       combined with other forms of  globbing  within  the
       same  path segment; in that case, the ‘*’ operators
       revert to their usual effect.

So note that **foo.c matches 'barfoo.c' (with the star just matching
characters), but *not* bar/foo.c or bar/barfoo.c.  (Or at least it does
in zsh, and I think also in rsync, so keeping the same behaviour is
probably good.)

Perhaps you should split the glob on / separators, then translate each
of them into a RE part, handling '**/' as a special case.

(I'm not suggesting to add the # syntax; it's just used in the
explanation.)

-- 
Martin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060109/db7dd754/attachment.pgp 


More information about the bazaar mailing list