[PATCH][MERGE] Improvements to is_ignored

John Arbash Meinel john at arbash-meinel.com
Wed Jan 11 21:27:36 GMT 2006


Jan Hudec wrote:
> On Wed, Jan 11, 2006 at 07:00:19 -0600, John Arbash Meinel wrote:
> 
>>Jan Hudec wrote:
>>
>>>On Tue, Jan 10, 2006 at 22:35:07 -0600, John A Meinel wrote:
>>>
>>>>>>I also found an interesting problem if you don't use (?:), specifically:
>>>>>>bzr: ERROR: exceptions.AssertionError: sorry, but this version only
>>>>>>supports 100 named groups
>>>>>> at /usr/lib/python2.4/sre_compile.py line 506
>>>>>> in compile
>>>
>>>
>>>Ouch.
>>>
>>>Well, at least we need to call is_ignored before calling is_ignored_by, so we
>>>don't iterate over anything unless when we are not going to find anything
>>>(especially since that is the worst case).
>>>
>>
>>Sure. I think most code doesn't call is_ignored_by until they have an
>>idea it is ignored.
>>But I would definitely at some point call is_ignored, before iterating
>>over the is_ignored_by patterns.
> 
> 
> Well, there are just 2 users of is_ignored_by. One is smart_add_tree and
> the other is cmd_ignored. So I'll make sure they only call is_ignored_by
> on ignored entries. Maybe I'll rename the method and specify it must not
> be called on non-ignored entries.
> 

We could do it either way. We could have is_ignored_by call is_ignored
first, and then do the loop, or we could just expect clients to do it. I
don't really care.

> 
>>>>>So we should add a test with enough patterns to provoke this, calling
>>>>>both bzr status and bzr ignored.
>>>
>>>Ok, I'll look at it tomorrow.
>>
>>Thanks. It would also be good to write some tests for matching behavior,
>>to make sure that the (?:.*/) pattern always matches. I think paths
>>always start with at least ./, but I won't guarantee that without some
>>tests.
> 
> 
> I will look into it later when I try to implement better pattern
> translation.
> 

How are you thinking to do it? I don't know if you saw my posts, as I
have implemented one, with a bunch of tests in my encoding branch. At
the very least you might want to grab my test cases.

> 
>>(I also can't guarantee that paths don't have \, but at that point they
>>shouldn't have \.)
>>I would also do a check for:
>>if '\\' in pat:
>>  pat = pat.replace('\\', '/')
>>
>>Because I do believe the file paths have been normalized. (fnmatch might
>>translate it correctly, but I don't think it does).
> 
> 
> Fnmatch sucks a big time, unfortunately. It certainly does not translate
> anything.
> 
> 
>>By the way, in general I like what you've done. I'm not sure if we want
>>to add: re.UNICODE to the re.compile() command. It supposedly only
>>changes the meaning of \w, etc, so it may not be necessary.
> 
> 
> Well, the important thing is, that it should work to have a pattern
> like:
> *.fň
> and have files with that extension and they should match. And they
> should continue to match if I branch say from iso-8859-2 system to a
> utf-8 one.
> 

In my testing, if you do
>>> x = re.compile(u'fň')
>>> x.match(u'fň') is not None
True

So you don't have to use re.UNICODE. It only changes the meaning of
escape characters. We create utf-8 encoded entries in .bzrignore.
Probably this means we shouldn't be telling people to 'echo foo >>
.bzrignore' since 'foo' is likely to be in user encoding not in utf-8.

> 
>>I ended up looking deeper, and found that the problem with fnmatch() not
>>matching unicode filenames is that we encode('utf-8') when we write
>>.bzrignore, but we don't decode() when we read it.
> 
> 
> The question is how will the re engine deal with unicode patterns and
> that all the manipulation has to properly use unicode strings to avoid
> the screwed python default conversion. I'll at least make sure of the
> later.
> 

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060111/d1417391/attachment.pgp 


More information about the bazaar mailing list