Alternate glob matcher for .bzrignore
Martin Pool
mbp at sourcefrog.net
Sun Jan 8 23:14:47 GMT 2006
On Sat, 2006-01-07 at 13:40 -0600, John Arbash Meinel wrote:
> John Arbash Meinel wrote:
> > Now with the actual attachments :)
> >
> > John
> > =:->
> >
> >
> > John Arbash Meinel wrote:
> >
> >>In my encodings branch, I found that fnmatch doesn't match unicode
> >>characters.
> >>
> >>So if you do:
> >>$ echo 'test' > Bågfors.txt
> >>$ bzr unknowns
> >>"Bågfors.txt"
> >>$ bzr ignore ./Bågfors.txt
> >>$ cat .bzrignore
> >>./Bågfors.txt
> >>$ bzr unknows #This is what fails
> >>"Bågfors.txt"
> >>
> >>We had discussed in the past changing the matcher so that it would
> >>create one big pattern, and then from that, we would check all paths one
> >>time, instead of checking each file many times. (This should help with
> >>paths with a large number of ignored files and patterns).
> >>
> >>I did some work to implement it. Basically creating a new translator
> >>from glob patterns into regular expressions. I also updated the fact so
> >>that "*" doesn't match directories. (It would be nice if we didn't have
> >>to worry about backslash being a directory separator.)
> >>
> >>Anyway, attached is the glob_matcher, and the test suite I wrote for it.
> >>They are present in my encoding branch.
> >>
> >>To replace our current "is_ignored" check, we would have to do:
That looks good.
>
> In doing some more testing, ('**/' + pat) may not work, because it
> probably wants at least one directory separator to exist. In regular
> expression terms we want '(.*/)?'.
> I could write another globs_to_matcher() which would understand that if
> there is no '/' in the pattern, it needs to prepend the above to the
> regular expression.
> Or we could break the matching into 2 styles of patterns, one with, and
> one without paths. And then just check 2 regular expressions, one with
> just the trailing part of the path, and the other with the full path.
The zsh manpage says
A pathname component of the form ‘(foo/)#’ matches
a path consisting of zero or more directories
matching the pattern foo.
As a shorthand, ‘**/’ is equivalent to ‘(*/)#’;
note that this therefore matches files in the cur‐
rent directory as well as subdirectories. [...]
This form does not follow sym‐
bolic links; the alternative form ‘***/’ does, but
is otherwise identical. Neither of these can be
combined with other forms of globbing within the
same path segment; in that case, the ‘*’ operators
revert to their usual effect.
So note that **foo.c matches 'barfoo.c' (with the star just matching
characters), but *not* bar/foo.c or bar/barfoo.c. (Or at least it does
in zsh, and I think also in rsync, so keeping the same behaviour is
probably good.)
Perhaps you should split the glob on / separators, then translate each
of them into a RE part, handling '**/' as a special case.
(I'm not suggesting to add the # syntax; it's just used in the
explanation.)
--
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060109/db7dd754/attachment.pgp
More information about the bazaar
mailing list