[MERGE] Improvements to is_ignored, now with new glob converter and tests

Mon Jan 16 19:50:40 GMT 2006

On Mon, Jan 16, 2006 at 09:10:28 -0600, John A Meinel wrote:
> Jan Hudec wrote:
> > +    def test_leading_dotslash(self):
> > +        self.assertMatch(u'./foo', [u'foo'])
> > +        self.assertNotMatch(u'./foo', [u'\u8336/foo', u'barfoo', u'x/y/foo'])
> 
> I'm a little curious about './foo' matching 'foo'. But it does seem to
> do the rest correctly (it doesn't match foo in another directory)
> 
> But what about:
> self.assertMatch(u'./foo', [u'foo', u'./foo'])
> 
> If './foo' doesn't match './foo' I would definitely be surprised by the
> system.

No, ./foo really does NOT match ./foo and in this respect the behaviour did
not change. The semantics of patterns in .bzrignore always was:
 - pattern not containing / is matched against filename only
 - pattern containing / is matched agains full path from root
 - to request matching only entries in the tree root, start the pattern with
   ./

Since the filenames are passed to is_ignored without leading ./, we need to
strip it from the pattern as well.

I'll check how it's documented in the source.

> > === modified file 'bzrlib/workingtree.py'
> > --- bzrlib/workingtree.py
> > +++ bzrlib/workingtree.py
> > @@ -46,6 +46,7 @@
> >  import os
> >  import stat
> >  import fnmatch
> > +import re
> >   
> >  from bzrlib.branch import (Branch,
> >                             is_control_file,
> > @@ -75,7 +76,7 @@
> >                              rename)
> >  from bzrlib.textui import show_status
> >  import bzrlib.tree
> > -from bzrlib.trace import mutter
> > +from bzrlib.trace import mutter, warning
> >  import bzrlib.xml5
> >  
> >  
> > @@ -85,7 +86,6 @@
> >      This should probably generate proper UUIDs, but for the moment we
> >      cope with just randomness because running uuidgen every time is
> >      slow."""
> > -    import re
> >      from binascii import hexlify
> >      from time import time
> >  
> > @@ -111,6 +111,106 @@
> >  def gen_root_id():
> >      """Return a new tree-root file id."""
> >      return gen_file_id('TREE_ROOT')
> 
> PEP8 comment:
> Documentation strings come *after* the object they are documenting, not
> before.
> 
> > +
> > +
> > +"""Do a multiple-pattern substitution.
> > +
> > +The patterns and substitutions are combined into one, so the result of
> > +one replacement is never substituted again. Add the patterns and
> > +replacements via the add method and then call the object. The patterns
> > +must not contain capturing groups.
> > +"""
> > +class _replacer(object):
> 
> So this should be:
> 
> class _replacer(object):
>     """Do a multiple-pattern substitution.
> 
>     The patterns and substitutions are combined into one, so the result
>     of one replacement is never substituted again. Add the patterns and
>     replacements via the add method and then call the object. The
>     patterns must not contain capturing groups.
>     """

Ok. Fixing.

> > +    _expand = re.compile(ur'\\&')
> > +    def __init__(self):
> > +        self._pat = None
> > +        self._pats = []
> > +        self._funs = []
> > +
> > +    r"""Add a pattern and replacement.
> > +
> > +    The pattern must not contain capturing groups.
> > +    The replacement might be either a string template in which \& will be
> > +    replaced with the match, or a function that will get the matching text as
> > +    argument. It does not get match object, because capturing is forbidden
> > +    anyway.
> > +    """
> > +    def add(self, pat, fun):

Fixing here as well ;-).

> > +def _glob_to_regex(pat):
> > +def _glob_list_to_regex(pats, wrap=u'(?:%s)'):
> 
> I would like to see _replacer, _glob_to_regex, and _glob_list_to_regex
> put into a new file, and then made public. (maybe not _replacer)

Can be. It will need factoring the ./ logic out somehow though if the code is
to be generic. I'd fix it by having all patterns anchored in the generic code
and add preprocess the ignore patterns. The preprocessing would be public
too, just separate if someone wanted differnt preprocessing.

> > +    def _get_ignore_by_regex_list(self):
> > +        """Return regex list for is_ignored_by method.
> > +
> > +        Cached in the Tree object after the first call.
> > +
> > +        The return is a list of lists, each having pattern as the first
> > +        element, followed by list of globs it is composed from.
> > +        """
> > +        if not hasattr(self, '_ignore_by_regex_list'):
> > +            pats = list(self.get_ignore_list()) # So we can shift...
> > +            self._ignore_by_regex_list = []
> > +            while pats:
> > +                self._ignore_by_regex_list.append(
> > +                        [re.compile(_glob_list_to_regex(pats[0:50],
> > +                                    wrap=u'(%s)'))]
> > +                        + pats[0:50])
> > +                pats[0:50] = ()
> > +        return self._ignore_by_regex_list
> 
> I think the more common way of doing this with python is to do:
> pats = pats[50:]
> 
> Then you also don't have to worry about passing it through list() since
> you aren't modifying it in place.

Ok. Will do it that way. I wonder which one is faster...

> I'm +1 on the contents, provided that at a minimum the documentation for
> _replacer is fixed. And preferably all the matching code is moved into
> its own file. It really shouldn't be in workingtree.py

Ok. I'll create one patch with the small fixes and then another moving the
glob code out to a separate file, so it can be reused by other places wanting
patterns.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060116/2ab8f600/attachment.pgp