[MERGE] is_ignored improvements...

John Arbash Meinel john at arbash-meinel.com
Thu May 18 20:14:02 BST 2006


Jan Hudec wrote:
> On Thu, May 18, 2006 at 11:21:17 -0500, John Arbash Meinel wrote:

...

>> Out of these requests 3 use the return value, and 6 just check for
>> true/false.
> 
> ... so I just tried a simple, not at all representative test. I took patterns
>   ['.*\\.%i' % i for i in range(0, 999) and matched them:
> 1. Non-capturing groups, all at once.
> 2. Non-capturing groups, 50 at a time, until match is found.
> 3. Capturing groups, 50 at a time, until match is found.
> 4. No groups, each separately, until match is found.
> against a pattern corresponding to 789th group ('foo.789'). And the times for
> 100000 runs on my machine (Athlon64 3000 (2.2GHz)) were about:
> 1. 12.5
> 2. 13.7
> 3. 18.9
> 4. 70.3
> 
> So the groups slows it down by about a half, but compared to matching one
> pattern at a time it's not /that/ significant.
> 
> Now I have to say, that when I doubled the pattern legnth and matched against
> 'foo.1789', the times doubled even for the single pattern. That means that
> the pattern compiler does not compile to a DFA. So it will be N^2 in all
> cases until python gets a better regex compiler.
> 

I'm a little curious about what syntax you used for the final
compilation. Because I tried:
>>> patterns = ['.*\\.%i' % i for i in range(0,1999)]
>>> all = re.compile('|'.join(patterns))
>>> all.match('foo.789').group()
'foo.7'
>>> all = re.compile('(' + '|'.join(patterns) + ')$')
>>> all.match('foo.789').group()
'foo.789'

So without using the (patterns)$ form, it was actually matching the
wrong pattern. (it was matching *.7 instead of *.789).

And I'm not positive, but I thought a DFA was not possible when you had
a pattern with '^' or '$'.  At least that's what I remember reading from
the tla source code. I'm not really a regex expert.

Anyway, just to say, because we need to anchor at the beginning and end
of the pattern I don't think it is possible to create a DFA.

John
=:->

PS> Do you have the test code that you used, I'd like to play around
with it.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060518/8d7a88ca/attachment.pgp 


More information about the bazaar mailing list