'bzr status' stats each file multiple times

Robert Collins robertc at robertcollins.net
Thu Dec 15 01:41:45 GMT 2005


On Mon, 2005-12-05 at 08:31 +0100, Jan Hudec wrote:
> On Sun, Dec 04, 2005 at 15:28:51 -0600, John A Meinel wrote:
> > Actually, the thing that seems to take a really long time for me is the
> > unknowns check. I have a rather large .bzrignore, because this project
> > likes to build inside the tree. It's a rather large project with 1600
> > source files, and about 50 output executables.
> > So I think bzr is trying to match each file it finds against all entries
> > in .bzrignore.
> > Since a lot of them are absolute paths, I'm thinking bzr should use a
> > dictionary for the absolute paths, then it can just say "is path in
> > ignored_paths" rather than doing a fnmatchcase against each one. 50x500
> > files takes a while.
> 
> Looking at the fnmatch module, it does the matching via regexps internally:
> 
> def fnmatchcase(name, pat):
>     if not pat in _cache:
>         res = translate(pat)
>         _cache[pat] = re.compile(res)
>     return _cache[pat].match(name) is not None
> 
> 
> (+ some import and docstring, that are irrelevant now).
> 
> While it caches the patterns, it:
> 1) Does the hash lookup each time through the loop.
> 2) Matches the patterns independently.
> 
> Thus I'd suggest:
> 1) Convert all the patterns to regexes. Using the fnmatch.translate is
>    possible, though I am not sure the '$' it appends is not a problem in
>    the next step. Custom translator would also have the advantage of
>    allowing to extend the syntax (though I'd rather see option to put
>    regexps in the .bzrignore directly).
> 2) Join all the patterns with '|'.
> 3) Compile the one long pattern.
> 4) Match each filename just once against this pattern.
> 
> I am not sure how well python optimizes the regular expression, but it
> will certainly do a better job than matching against all of them
> separately.

Right. I agree - to optimise the matcher it should broadly:
* Build a single composite rule from .bzrignore
* Apply this once to each non versioned file.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051215/cecb3cc2/attachment.pgp 


More information about the bazaar mailing list