[MERGE] Improvements to is_ignored, factored out glob stuff.

Jan Hudec bulb at ucw.cz
Wed Jan 18 07:49:46 GMT 2006


On Tue, Jan 17, 2006 at 16:44:32 -0600, John Arbash Meinel wrote:
> Jan Hudec wrote:
> > Hello All,
> > 
> > Commited http://drak.ucw.cz/~bulb/bzr/bzr.ignore revision 1527, which factors out
> > the glob stuff into a separate file.
> > 
> 
> ...
> > +class replacer(object):
> > +    """Do a multiple-pattern substitution.
> > +
> > +    The patterns and substitutions are combined into one, so the result of
> > +    one replacement is never substituted again. Add the patterns and
> > +    replacements via the add method and then call the object. The patterns
> > +    must not contain capturing groups.
> > +    """
> > +    _expand = re.compile(ur'\\&')
> > +    def __init__(self):
> > +        self._pat = None
> > +        self._pats = []
> > +        self._funs = []
> > +
> 
> I think PEP8 says there should be a blank line between the final """ of
> the doc string, and the _expand, and then another one before def. But
> Robert is the PEP8 guru/grammer nazi :)

Ok. Fixing. Also renamed the class to Replacer to follow naming
convention.

[...]
> > +_sub_glob.add(ur'^RE:.*', _sub_re) # RE:<anything> is a regex.
> 
> Interesting. I like the idea, I'm not sure if it will ever be used.
> Since colons are uncommon, I doubt people will try to match "RE:*" normally.

... plus one can always escape it as [R]E: if needed. Maybe I should
also add a GLOB: escape.

> > +_sub_glob.add(ur'(?:(?<=/)|^)(?:\.?/)+', u'') # Canonicalize
> > +_sub_glob.add(ur'\\.', ur'\&') # keep anything backslashed...
> > +_sub_glob.add(ur'[(){}|^$+.]', ur'\\&') # escape specials...

Note, that I am escaping () and |, but it would really be easy to
instead not escape these and convert # to * and ## to *?, like zsh does
(ok, we'd need to covert ( to (?: because of the no-capture requirement
of is_ignored_by). Would have to define * and ? in a more complicated,
but more robust, way.

> > +_sub_glob.add(ur'(?:(?<=/)|^)\*\*\*/', ur'(?:[^/]+/)*') # ***, includes .*
> > +_sub_glob.add(ur'(?:(?<=/)|^)\*\*/', ur'(?:[^./][^/]*/)*') # **, zsh-style
> > +_sub_glob.add(ur'(?:(?<=/)|^)\*\.', ur'(?:[^./][^/]*)\.') # *. after / or at start
> > +_sub_glob.add(ur'(?:(?<=/)|^)\*', ur'(?:[^./][^/]*)?') # * after / or at start
> > +_sub_glob.add(ur'\*', ur'[^/]*') # * elsewhere
> > +_sub_glob.add(ur'(?:(?<=/)|^)\?', ur'[^./]') # ? after / or at start
> > +_sub_glob.add(ur'\?', ur'[^/]') # ? elsewhere
> > +def translate_list(pats, wrap=u'(?:%s)'):
> > +    """Convert a list of unix globs to a regular expression.
> > +
> > +    The pattern is returned as string. The wrap is % format applied to each
> > +    individual glob pattern. It has to apply group.
> > +
> > +    See translate for glob semantics.
> > +    """
> > +    return u'|'.join([wrap % translate(x) for x in pats])
> > +
> > +    """
> > +    Patterns containing '/' need to match whole path; others match
> > +    against only the last component - as per requirement of
> > +    WorkingTree.is_ignored().
> > +    """
> > +
> 
> Why is this docstring, code docstring? Did you mean to write this as a
> comment, or should it be part of the rest of the doc string?

... because I meant to write the anchor_globs function around it and
then wrote doc for it separately. Deleted.

> > +def anchor_globs(pats):
> > +    """Convert file-globs to path globs as used in ignore patterns.
> > +
> > +    If a pattern contains '/' or starts with './', it should match whole path
> > +    from root (optional './' is stripped), otherwise it should match the
> > +    filename only. Thus such patterns are prefixed with '***/'.
> > +
> > +    """
> > +    return [anchor_glob(pat) for pat in pats]
> > 
> 
> You probably don't need to say 'If a pattern contains '/' or starts with
> './' since './' contains '/'. Just
> 'If a pattern contains '/' it should match the whole path from root'
> 
> I'm a little curious that you say './' is stripped when as near as I can
> tell, no such thing is done.

I rewrote the comment from scratch. As for './' it is stripped, though
not here, but rather during compilation. I first thought I will have to
strup it here and then I realized I should normalize the pattern anyway.

> > +    def assertMatch(self, glob, names):
> > +        rx = compile(anchor_glob(glob))
> > +        for name in names:
> > +            if not rx.match(name):
> > +                raise AssertionError(repr(
> > +                        u'name "%s" does not match glob "%s" (rx="%s")' %
> > +                        (name, glob, rx.pattern)))
> > +
> > +    def assertNotMatch(self, glob, names):
> > +        rx = compile(anchor_glob(glob))
> > +        for name in names:
> > +            if rx.match(name):
> > +                raise AssertionError(repr(
> > +                        u'name "%s" does match glob "%s" (rx="%s")' %
> > +                        (name, glob, rx.pattern)))
> 
> For clarity, I wrote an assertMatch that takes a list of matching
> patterns and a list of non-matching patterns.
> So that instead of writing:
> 
> self.assertMatch('pattern', [a, b, c])
> self.assertNotMatch('pattern', [d, e, f])
> You could just write
> self.assertMatch('pattern', [a, b, c], [d, e, f])
> 
> I also wouldn't raise AssertionError. I would tend to write it as:
> 
> def assertMatching(self, glob, matching, non_matching):
>   rx = compile(anchor_glob(glob))
>   for path in matching:
>     self.failUnless(rx.match(path),
>         'name %r does not match glob %r (rx=%r)'
>         % (path, glob, rx.pattern))
>   for path in non_matching:
>     self.failIf(rx.match(path),
>         'name %r does match glob %r (rx=%r)'
>         % (path, glob, rx.pattern))
> 
> I'm not going to require you to rewrite all of the
> assertMatching/assertNotMatching, but I would like to see the
> 'self.failIf' rather than raise AssertionError.

Did both. "Oh mighty regular expression", combined with "Oh also mighty
:global command" made that rather simple ;-).

> > +    def test_end_anchor(self):
> > +        self.assertMatch(u'*.333', [u'foo.333'])
> > +        self.assertNotMatch(u'*.3', [u'foo.333'])
> > +
> > +class TestBzrignore(TestCaseInTempDir):
> > +
> > +    shape = None
> > +
> 
> PEP8: Two whitespaces between classes.

Ok.

> > +    def setUp(self):
> > +        super(TestBzrignore, self).setUp()
> > +        self.branch = Branch.initialize(u'.')
> > +        self.wt = self.branch.working_tree()
> > +
> > +    def putIgnores(self, ignores):
> > +        bzrignore = file(u'.bzrignore', 'wb')
> > +        bzrignore.write(ignores)
> > +
> 
> The rest looks good to me.

Thanks. I Will add some more tests (for the RE: thing and ***/ at least)
and resubmit.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060118/ad0ef473/attachment.pgp 


More information about the bazaar mailing list