'bzr status' stats each file multiple times

Jan Hudec bulb at ucw.cz
Mon Dec 5 07:31:07 GMT 2005


On Sun, Dec 04, 2005 at 15:28:51 -0600, John A Meinel wrote:
> Actually, the thing that seems to take a really long time for me is the
> unknowns check. I have a rather large .bzrignore, because this project
> likes to build inside the tree. It's a rather large project with 1600
> source files, and about 50 output executables.
> So I think bzr is trying to match each file it finds against all entries
> in .bzrignore.
> Since a lot of them are absolute paths, I'm thinking bzr should use a
> dictionary for the absolute paths, then it can just say "is path in
> ignored_paths" rather than doing a fnmatchcase against each one. 50x500
> files takes a while.

Looking at the fnmatch module, it does the matching via regexps internally:

def fnmatchcase(name, pat):
    if not pat in _cache:
        res = translate(pat)
        _cache[pat] = re.compile(res)
    return _cache[pat].match(name) is not None


(+ some import and docstring, that are irrelevant now).

While it caches the patterns, it:
1) Does the hash lookup each time through the loop.
2) Matches the patterns independently.

Thus I'd suggest:
1) Convert all the patterns to regexes. Using the fnmatch.translate is
   possible, though I am not sure the '$' it appends is not a problem in
   the next step. Custom translator would also have the advantage of
   allowing to extend the syntax (though I'd rather see option to put
   regexps in the .bzrignore directly).
2) Join all the patterns with '|'.
3) Compile the one long pattern.
4) Match each filename just once against this pattern.

I am not sure how well python optimizes the regular expression, but it
will certainly do a better job than matching against all of them
separately.

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051205/7a2658a4/attachment.pgp 


More information about the bazaar mailing list