[MERGE] is_ignored improvements...
Jan Hudec
bulb at ucw.cz
Thu May 18 19:40:01 BST 2006
On Thu, May 18, 2006 at 11:21:17 -0500, John Arbash Meinel wrote:
> Did you try performance testing with and without group matching? I would
> have guessed that it was faster to match without the groups.
> The reason I originally preferred Jan's double api, is because *most* of
> the callers of is_ignored() didn't actually care about what pattern was
> matched. So why spend the time if you are just throwing the information
> away.
>
> Remember, status also uses 'is_ignored', and that is actually where *I*
> ran into the major slowdown. (add, status, ignored, info all use it)
>
> If you do grep "\<is_ignored\>", you get:
> add.py:158: ignore_glob = tree.is_ignored(subp)
> builtins.py:1425: pat = tree.is_ignored(path)
> info.py:203: if working.is_ignored(path):
> tree.py:242: if new_tree.is_ignored(filename):
> workingtree.py:672: elif self.is_ignored(filename):
> workingtree.py:710: elif self.is_ignored(fp):
> workingtree.py:889: if not self.is_ignored(subp):
> workingtree.py:973: pat = self.is_ignored(subp)
> workingtree.py:994: def is_ignored(self, filename):
> workingtree.py:1160: if self.is_ignored(f):
>
> Out of these requests 3 use the return value, and 6 just check for
> true/false.
... so I just tried a simple, not at all representative test. I took patterns
['.*\\.%i' % i for i in range(0, 999) and matched them:
1. Non-capturing groups, all at once.
2. Non-capturing groups, 50 at a time, until match is found.
3. Capturing groups, 50 at a time, until match is found.
4. No groups, each separately, until match is found.
against a pattern corresponding to 789th group ('foo.789'). And the times for
100000 runs on my machine (Athlon64 3000 (2.2GHz)) were about:
1. 12.5
2. 13.7
3. 18.9
4. 70.3
So the groups slows it down by about a half, but compared to matching one
pattern at a time it's not /that/ significant.
Now I have to say, that when I doubled the pattern legnth and matched against
'foo.1789', the times doubled even for the single pattern. That means that
the pattern compiler does not compile to a DFA. So it will be N^2 in all
cases until python gets a better regex compiler.
--
Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060518/834076d2/attachment.pgp
More information about the bazaar
mailing list