[MERGE] is_ignored improvements...
Jan Hudec
bulb at ucw.cz
Sat May 20 21:11:29 BST 2006
On Sat, May 20, 2006 at 11:57:15 -0500, John Arbash Meinel wrote:
> Jan Hudec wrote:
> > On Sat, May 20, 2006 at 09:35:28 -0500, John A Meinel wrote:
>
>
> ...
>
> >> Well, I think the #1 pattern format is *.foo, where we are just looking
> >> for some sort of extension. And looking in all directories, etc. I
> >> almost wonder if we wouldn't be better off with some sort of translation
> >> that changes all of the *.foo into 'path.endswith()' calls.
> >
> > You can try to time it, but I don't believe it. I do more believe in
> > stripping the *. from them, converting to regexps, oring together and then
> > wrapping with r'.*\.(?:%s)'. That would make a fourth case in the converting
> > switch.
>
> Yeah, as I saw it was actually slower than the separate regex.
> regex.match() is actually faster than endswith() (with a compiled regex).
Strangely enough if I just try:
bar = re.compile('.*bar$')
Timer('bar.match("foobar")', 'from __main__ import bar').timeit()
it is significantly slower than
Timer('"foobar".endswith("bar")', '').timeit()
But yes, I saw your results. Maybe it was the method calls.
> >> I also tried this:
> >> compPrefix = [re.compile('.*\\.(?:' +
> >> '|'.join(['(?:%s$)' % i for i in range(0, max)])
> >> + ')' )]
> >> (Factoring out the '.*\.' prefix), and I found that it doubles the
> >> performance:
> >>
> >> # For foo.19
> >> $ python ,time-matches.py
> >> NoMatch: 0.572
> >> NoMatchSplit: 0.568
> >> MatchSplit: 0.740
> >> Separate: 2.792
> >> Prefix: 0.292
> >> PrefixSplit: 0.288
> >> Endswith: 3.509
> >>
> >> So I think there is stuff worth looking into.
> >
> > Yes. Seems to make quite a difference.
> >
>
> Yep. I would also say that it would be nice if we put commonly hit
> patterns early, since I also saw a big difference between foo.19 and
> foo.999 (more of a difference than even the difference between 100, 1k
> and 2k patterns).
Yes. But I can't really think of a good way to do this. Given how the
patterns are given. Also the worst case is not matching, which is
unfortunately the most common one :-(
--
Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060520/5ef25326/attachment.pgp
More information about the bazaar
mailing list