[MERGE] is_ignored improvements...
John Arbash Meinel
john at arbash-meinel.com
Thu May 18 20:14:02 BST 2006
Jan Hudec wrote:
> On Thu, May 18, 2006 at 11:21:17 -0500, John Arbash Meinel wrote:
...
>> Out of these requests 3 use the return value, and 6 just check for
>> true/false.
>
> ... so I just tried a simple, not at all representative test. I took patterns
> ['.*\\.%i' % i for i in range(0, 999) and matched them:
> 1. Non-capturing groups, all at once.
> 2. Non-capturing groups, 50 at a time, until match is found.
> 3. Capturing groups, 50 at a time, until match is found.
> 4. No groups, each separately, until match is found.
> against a pattern corresponding to 789th group ('foo.789'). And the times for
> 100000 runs on my machine (Athlon64 3000 (2.2GHz)) were about:
> 1. 12.5
> 2. 13.7
> 3. 18.9
> 4. 70.3
>
> So the groups slows it down by about a half, but compared to matching one
> pattern at a time it's not /that/ significant.
>
> Now I have to say, that when I doubled the pattern legnth and matched against
> 'foo.1789', the times doubled even for the single pattern. That means that
> the pattern compiler does not compile to a DFA. So it will be N^2 in all
> cases until python gets a better regex compiler.
>
I'm a little curious about what syntax you used for the final
compilation. Because I tried:
>>> patterns = ['.*\\.%i' % i for i in range(0,1999)]
>>> all = re.compile('|'.join(patterns))
>>> all.match('foo.789').group()
'foo.7'
>>> all = re.compile('(' + '|'.join(patterns) + ')$')
>>> all.match('foo.789').group()
'foo.789'
So without using the (patterns)$ form, it was actually matching the
wrong pattern. (it was matching *.7 instead of *.789).
And I'm not positive, but I thought a DFA was not possible when you had
a pattern with '^' or '$'. At least that's what I remember reading from
the tla source code. I'm not really a regex expert.
Anyway, just to say, because we need to anchor at the beginning and end
of the pattern I don't think it is possible to create a DFA.
John
=:->
PS> Do you have the test code that you used, I'd like to play around
with it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060518/8d7a88ca/attachment.pgp
More information about the bazaar
mailing list