[MERGE] Improvements to is_ignored, factored out glob stuff.
Jan Hudec
bulb at ucw.cz
Wed Jan 18 07:49:46 GMT 2006
On Tue, Jan 17, 2006 at 16:44:32 -0600, John Arbash Meinel wrote:
> Jan Hudec wrote:
> > Hello All,
> >
> > Commited http://drak.ucw.cz/~bulb/bzr/bzr.ignore revision 1527, which factors out
> > the glob stuff into a separate file.
> >
>
> ...
> > +class replacer(object):
> > + """Do a multiple-pattern substitution.
> > +
> > + The patterns and substitutions are combined into one, so the result of
> > + one replacement is never substituted again. Add the patterns and
> > + replacements via the add method and then call the object. The patterns
> > + must not contain capturing groups.
> > + """
> > + _expand = re.compile(ur'\\&')
> > + def __init__(self):
> > + self._pat = None
> > + self._pats = []
> > + self._funs = []
> > +
>
> I think PEP8 says there should be a blank line between the final """ of
> the doc string, and the _expand, and then another one before def. But
> Robert is the PEP8 guru/grammer nazi :)
Ok. Fixing. Also renamed the class to Replacer to follow naming
convention.
[...]
> > +_sub_glob.add(ur'^RE:.*', _sub_re) # RE:<anything> is a regex.
>
> Interesting. I like the idea, I'm not sure if it will ever be used.
> Since colons are uncommon, I doubt people will try to match "RE:*" normally.
... plus one can always escape it as [R]E: if needed. Maybe I should
also add a GLOB: escape.
> > +_sub_glob.add(ur'(?:(?<=/)|^)(?:\.?/)+', u'') # Canonicalize
> > +_sub_glob.add(ur'\\.', ur'\&') # keep anything backslashed...
> > +_sub_glob.add(ur'[(){}|^$+.]', ur'\\&') # escape specials...
Note, that I am escaping () and |, but it would really be easy to
instead not escape these and convert # to * and ## to *?, like zsh does
(ok, we'd need to covert ( to (?: because of the no-capture requirement
of is_ignored_by). Would have to define * and ? in a more complicated,
but more robust, way.
> > +_sub_glob.add(ur'(?:(?<=/)|^)\*\*\*/', ur'(?:[^/]+/)*') # ***, includes .*
> > +_sub_glob.add(ur'(?:(?<=/)|^)\*\*/', ur'(?:[^./][^/]*/)*') # **, zsh-style
> > +_sub_glob.add(ur'(?:(?<=/)|^)\*\.', ur'(?:[^./][^/]*)\.') # *. after / or at start
> > +_sub_glob.add(ur'(?:(?<=/)|^)\*', ur'(?:[^./][^/]*)?') # * after / or at start
> > +_sub_glob.add(ur'\*', ur'[^/]*') # * elsewhere
> > +_sub_glob.add(ur'(?:(?<=/)|^)\?', ur'[^./]') # ? after / or at start
> > +_sub_glob.add(ur'\?', ur'[^/]') # ? elsewhere
> > +def translate_list(pats, wrap=u'(?:%s)'):
> > + """Convert a list of unix globs to a regular expression.
> > +
> > + The pattern is returned as string. The wrap is % format applied to each
> > + individual glob pattern. It has to apply group.
> > +
> > + See translate for glob semantics.
> > + """
> > + return u'|'.join([wrap % translate(x) for x in pats])
> > +
> > + """
> > + Patterns containing '/' need to match whole path; others match
> > + against only the last component - as per requirement of
> > + WorkingTree.is_ignored().
> > + """
> > +
>
> Why is this docstring, code docstring? Did you mean to write this as a
> comment, or should it be part of the rest of the doc string?
... because I meant to write the anchor_globs function around it and
then wrote doc for it separately. Deleted.
> > +def anchor_globs(pats):
> > + """Convert file-globs to path globs as used in ignore patterns.
> > +
> > + If a pattern contains '/' or starts with './', it should match whole path
> > + from root (optional './' is stripped), otherwise it should match the
> > + filename only. Thus such patterns are prefixed with '***/'.
> > +
> > + """
> > + return [anchor_glob(pat) for pat in pats]
> >
>
> You probably don't need to say 'If a pattern contains '/' or starts with
> './' since './' contains '/'. Just
> 'If a pattern contains '/' it should match the whole path from root'
>
> I'm a little curious that you say './' is stripped when as near as I can
> tell, no such thing is done.
I rewrote the comment from scratch. As for './' it is stripped, though
not here, but rather during compilation. I first thought I will have to
strup it here and then I realized I should normalize the pattern anyway.
> > + def assertMatch(self, glob, names):
> > + rx = compile(anchor_glob(glob))
> > + for name in names:
> > + if not rx.match(name):
> > + raise AssertionError(repr(
> > + u'name "%s" does not match glob "%s" (rx="%s")' %
> > + (name, glob, rx.pattern)))
> > +
> > + def assertNotMatch(self, glob, names):
> > + rx = compile(anchor_glob(glob))
> > + for name in names:
> > + if rx.match(name):
> > + raise AssertionError(repr(
> > + u'name "%s" does match glob "%s" (rx="%s")' %
> > + (name, glob, rx.pattern)))
>
> For clarity, I wrote an assertMatch that takes a list of matching
> patterns and a list of non-matching patterns.
> So that instead of writing:
>
> self.assertMatch('pattern', [a, b, c])
> self.assertNotMatch('pattern', [d, e, f])
> You could just write
> self.assertMatch('pattern', [a, b, c], [d, e, f])
>
> I also wouldn't raise AssertionError. I would tend to write it as:
>
> def assertMatching(self, glob, matching, non_matching):
> rx = compile(anchor_glob(glob))
> for path in matching:
> self.failUnless(rx.match(path),
> 'name %r does not match glob %r (rx=%r)'
> % (path, glob, rx.pattern))
> for path in non_matching:
> self.failIf(rx.match(path),
> 'name %r does match glob %r (rx=%r)'
> % (path, glob, rx.pattern))
>
> I'm not going to require you to rewrite all of the
> assertMatching/assertNotMatching, but I would like to see the
> 'self.failIf' rather than raise AssertionError.
Did both. "Oh mighty regular expression", combined with "Oh also mighty
:global command" made that rather simple ;-).
> > + def test_end_anchor(self):
> > + self.assertMatch(u'*.333', [u'foo.333'])
> > + self.assertNotMatch(u'*.3', [u'foo.333'])
> > +
> > +class TestBzrignore(TestCaseInTempDir):
> > +
> > + shape = None
> > +
>
> PEP8: Two whitespaces between classes.
Ok.
> > + def setUp(self):
> > + super(TestBzrignore, self).setUp()
> > + self.branch = Branch.initialize(u'.')
> > + self.wt = self.branch.working_tree()
> > +
> > + def putIgnores(self, ignores):
> > + bzrignore = file(u'.bzrignore', 'wb')
> > + bzrignore.write(ignores)
> > +
>
> The rest looks good to me.
Thanks. I Will add some more tests (for the RE: thing and ***/ at least)
and resubmit.
--
Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060118/ad0ef473/attachment.pgp
More information about the bazaar
mailing list