[PATCH][Bug 86451]Normalise path separators in ignore patterns
John Arbash Meinel
john at arbash-meinel.com
Thu Mar 1 14:35:13 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Kent Gibson wrote:
>
>
> Aaron Bentley wrote:
>>> It would be much better to fix the glob-to-regex conversion, and
>>> normalize those slashes.
>>>
>>>
> OK, here is a patch that normalises the path separator in ignore
> patterns, rather than just chopping trailing backslashes.
>
> I had a go at making the normalisation part of the glob-to-regex
> conversion proper, but that turned out to be impossible since the
> Replacer only does a single match/replace pass over each pattern, and
> you need to match backslashes twice - once for normalisation and once
> for canonicalisation. Rather than change the Replacer, I added
> normalisation as a separate first pass.
>
> There are also another couple of trivial fixed in there.
> Firstly, cmd_ignore now checks for DOS style absolute paths as well as
> Unix style, just to be consistent across platforms.
>
> Secondly, globbing now checks for and removes trailing backslashes in
> REs that can escape the bracket added when aggregating patterns. That
> is another corner case that can crash the pattern matcher, so I
> thought it was worth fixing it while I was there.
>
>
> Cheers,
> Kent.
>
Thanks for updating your patch. I'd really like to get this in 0.15.
There are a few small things, though.
...
v- Your spacing here is a little odd, I think it is better as:
if (len(m) % 2) != 0:
Also, while not strictly required, it would be good to have a simple
docstring. Maybe "check for trailing backslashes in the pattern"
+def _trailing_backslashes_regex(m):
+ if len(m) %2 != 0:
+ warning(u"Regular expressions cannot end with an odd number of
'\\'. "
+ "Dropping the final '\\'.")
+ return m[:-1]
+ return m
+
+
_sub_re = Replacer()
_sub_re.add(u'^RE:', u'')
_sub_re.add(u'\((?!\?)', u'(?:')
_sub_re.add(u'\(\?P<.*>', _invalid_regex(u'(?:'))
_sub_re.add(u'\(\?P=[^)]*\)', _invalid_regex(u''))
+_sub_re.add(ur'\\+$', _trailing_backslashes_regex)
_sub_fullpath = Replacer()
@@ -169,6 +178,7 @@
base_patterns = []
ext_patterns = []
for pat in patterns:
+ pat = normalize_pattern(pat)
if pat.startswith(u'RE:') or u'/' in pat:
path_patterns.append(pat)
elif pat.startswith(u'*.'):
@@ -200,3 +210,15 @@
return patterns[match.lastindex -1]
return None
+
+_normalize_re = re.compile(ur'\\(?!x\d\d|\d\d\d|u\d\d\d\d)')
+
+
+def normalize_pattern(pattern):
+ """Converts backslashes in path patterns to forward slashes.
+
+ Avoids converting escape backslashes.
+ """
+ if not pattern.startswith('RE:'):
+ pattern = _normalize_re.sub('/', pattern)
+ return pattern.rstrip('/')
^- If I understand correctly, you are allowing backslashes for character
codes (hex) \x00 or (octal) \000 (unicode decimal) \u0000. There are a
few problems, though.
1) I don't think you need to do this. I'm guessing you misunderstood the
tests. But the data in .bzrignore is meant to be utf-8 data. So
non-ascii characters are expected to be present, but not escapes.
2) If we do need this, then both "\x**" and "\u****" should allow
hexidecimal, not just decimal characters. For example "\xb5" and
"\uabcd" are both valid characters. (I have no idea what \uabcd is, but
u'\u03bc' == µ (mu)).
3) If we do leave this in, we need to make it a little bit clearer in
the docstring. "Avoids converting escape backslashes (for numerical
codes such as \xaa)". Or something like that. It took me a little while
to figure out what you were doing. Also \d{3} might be nicer than \d\d\d.
My guess is that you were confused by some of the tests, specifically these:
In these tests we use "u'\u8336'" because we are checking against a
Unicode character string: 茶/foo
Because when we are reading the filesystem, we end up with Unicode
strings in memory. So we want to make sure that .\foo doesn't match foo
in a subdirectory, even if it the parent directory is not an ascii string.
I might also be happier if we emitted warnings that ignore patterns
should use forward slashes. Rather than silently translating. But I'm
not going to be picky on that part.
+ def test_backslash(self):
+ self.assertMatch([
+ (u'.\\foo',
+ [u'foo'],
+ [u'\u8336/foo', u'barfoo', u'x/y/foo']),
+ (u'.\\f*',
+ [u'foo'],
+ [u'foo/bar', u'foo/.bar', u'x/foo/y']),
+ (u'foo\\**\\x',
+ [u'foo/x', u'foo/bar/x'],
+ [u'foox', u'foo/bax', u'foo/.x', u'foo/bar/bax']),
+ ])
+
+ def test_trailing_slash(self):
+ self.assertMatch([
+ (u'./foo/',
+ [u'foo'],
+ [u'\u8336/foo', u'barfoo', u'x/y/foo']),
+ (u'.\\foo\\',
+ [u'foo'],
+ [u'foo/', u'\u8336/foo', u'barfoo', u'x/y/foo']),
+ ])
+
def test_leading_asterisk_dot(self):
self.assertMatch([
(u'*.x',
So +1 (conditional). I think you need to fix the whitespace in one
place, and unless I'm misunderstanding something you probably want to
take out the special case for numerical escapes.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF5uSgJdeBCYSNAAMRAj18AJ9zL0t1vpr0uumlHna6atE9i04ZFwCgkqif
Rjg5Ob9Oik4v6Qb5ZZbLyiE=
=RFZj
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list