[PATCH][MERGE] Improvements to is_ignored

John Arbash Meinel john at arbash-meinel.com
Wed Jan 11 13:00:19 GMT 2006


Jan Hudec wrote:
> On Tue, Jan 10, 2006 at 22:35:07 -0600, John A Meinel wrote:
> 
>>Martin Pool wrote:
>>
>>>On 10 Jan 2006, John A Meinel <john at arbash-meinel.com> wrote:
>>>
>>>>Jan Hudec wrote:
>>>
>>>>Here are my results:
>>>>time bzr status
>>>>real    0m8.374s
>>>>user    0m7.840s
>>>>sys     0m0.404s
>>>>
>>>>time bzr.ignore/bzr status
>>>>real    0m5.497s
>>>>user    0m4.988s
>>>>sys     0m0.412s
>>>
>>>Very nice.
>>
>>>>So it was able to shave a lot of the time off. This is with 71 ignore
>>>>patterns, and 805 ignored files.
>>>>
>>>>I also found an interesting problem if you don't use (?:), specifically:
>>>>bzr: ERROR: exceptions.AssertionError: sorry, but this version only
>>>>supports 100 named groups
>>>>  at /usr/lib/python2.4/sre_compile.py line 506
>>>>  in compile
> 
> 
> Ouch.
> 
> Well, at least we need to call is_ignored before calling is_ignored_by, so we
> don't iterate over anything unless when we are not going to find anything
> (especially since that is the worst case).
> 

Sure. I think most code doesn't call is_ignored_by until they have an
idea it is ignored.
But I would definitely at some point call is_ignored, before iterating
over the is_ignored_by patterns.

> 
>>>So we should add a test with enough patterns to provoke this, calling
>>>both bzr status and bzr ignored.
> 
> 
> Ok, I'll look at it tomorrow.

Thanks. It would also be good to write some tests for matching behavior,
to make sure that the (?:.*/) pattern always matches. I think paths
always start with at least ./, but I won't guarantee that without some
tests.
(I also can't guarantee that paths don't have \, but at that point they
shouldn't have \.)
I would also do a check for:
if '\\' in pat:
  pat = pat.replace('\\', '/')

Because I do believe the file paths have been normalized. (fnmatch might
translate it correctly, but I don't think it does).


> 
> 
>>I agree.
>>
>>Also, I found a small advantage to using the pattern (a$)|(b$)|(c$)
>>versus doing ((a)|(b)|(c))$.
>>Which is nice, because it is easier to do.
> 
> 
> ... and which it does because it still uses fnmatch.translate, which
> always ends it's result with $.
> 

Yeah, I know that. I played with it, and wrote a loop to strip off the
$, and then add a global one, thinking it would be better not to repeat
$ all over the place. But I guess it makes it easier for the re to
compile it. (The difference was negligible, but consistent).

By the way, in general I like what you've done. I'm not sure if we want
to add: re.UNICODE to the re.compile() command. It supposedly only
changes the meaning of \w, etc, so it may not be necessary.

I ended up looking deeper, and found that the problem with fnmatch() not
matching unicode filenames is that we encode('utf-8') when we write
.bzrignore, but we don't decode() when we read it.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060111/0871ffcf/attachment.pgp 


More information about the bazaar mailing list