[MERGE] is_ignored improvements...

John Arbash Meinel john at arbash-meinel.com
Thu May 18 17:21:17 BST 2006


Robert Collins wrote:
> On Thu, 2006-05-18 at 14:34 +0200, Jan Hudec wrote:
>> On Thu, May 18, 2006 at 20:46:37 +1000, Robert Collins wrote:
>>> On Wed, 2006-05-17 at 22:25 +0200, Jan Hudec wrote:
>>>> Hello,
>>>>
>>>> On Tue, May 16, 2006 at 19:03:40 +1000, Robert Collins wrote:
>>>>> Jan Hudec was working on is_ignored improvements... they account for a
>>>>> nontrivial amount of the time of a 'bzr add' call.
>>>>>
>>>>> I was wondering if we can get the compatible replacement in - leaving
>>>>> the format changing etc stuff for a later time.
>>>> Ok, here it is. All tests pass and there are some new tests for globs and
>>>> ignore. Please review. For merging please use
>>>> http://drak.ucw.cz/~bulb/bzr/bzr/matcher/ branch.
>>>>
>>>> It splits the WorkingTree.is_ignored method to ignored, which returns just
>>>> a boolean and ignored_by, which returns the pattern. is_ignored is deprecated
>>>> forwarder to ignored_by (to remain compatible).
>>>>
>>>> I fixed the WorkingTree.ignored_files method, but it's neither used anywhere
>>>> nor tested. Perhaps it should be deprecated.
>>>>
>>>> There is no zero_nine in bzrlib.symbol_versioning yet, so I just used
>>>> zero_eight (just one deprecated method -- WorkingTree.is_ignored). Please fix
>>>> as appropriate for the version that will eventually get the changes.
>>> Thanks for this. I've had a good look at it, and I'm going to -1 it as
>>> it stands.
>>>
>>> The code is very complex compared to the current code, and we end up
>>> with two apis rather than one. I'm working on a variation derived from
>>> your work, which will combine the regexes but only need the single API.
>>> I'll post that up shortly, and I'd appreciate your review on this, as I
>>> know you put a lot of thought into the is_ignored work in your branch.
>> The two interfaces are there because python has arbitrary limit of 100
>> match groups in a pattern. So if I wanted to tell which pattern matched
>> (the user interface displays that), I had to do the match bunch at a
>> time rather than all at once. But in many places just a boolean is
>> wanted, so all patterns can be matched at once, without captures.
>>
>> The code is more complex than necessary because I didn't remove all the
>> stuff needed for switching between different glob types -- which is not
>> needed when there is just one.
> 
> Heres my alternative patch, finished up:
> 
> I currently prefer this over the original patch you prepared because of
> the preservation of API. I dont think many users will have more than 100
> regexs, and it should perform just fine.
> 
> What do you think? Does this make the future improvements you are
> working on harder? Is it a problem to take this variation?
> 
> Rob
> 

Did you try performance testing with and without group matching? I would
have guessed that it was faster to match without the groups.
The reason I originally preferred Jan's double api, is because *most* of
the callers of is_ignored() didn't actually care about what pattern was
matched. So why spend the time if you are just throwing the information
away.

Remember, status also uses 'is_ignored', and that is actually where *I*
ran into the major slowdown. (add, status, ignored, info all use it)

If you do grep "\<is_ignored\>", you get:
add.py:158:                    ignore_glob = tree.is_ignored(subp)
builtins.py:1425:            pat = tree.is_ignored(path)
info.py:203:        if working.is_ignored(path):
tree.py:242:        if new_tree.is_ignored(filename):
workingtree.py:672:        elif self.is_ignored(filename):
workingtree.py:710:                elif self.is_ignored(fp):
workingtree.py:889:            if not self.is_ignored(subp):
workingtree.py:973:            pat = self.is_ignored(subp)
workingtree.py:994:    def is_ignored(self, filename):
workingtree.py:1160:                if self.is_ignored(f):

Out of these requests 3 use the return value, and 6 just check for
true/false.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060518/109a31c6/attachment.pgp 


More information about the bazaar mailing list