[MERGE] add performance three

Martin Pool mbp at canonical.com
Mon May 22 05:55:29 BST 2006


On 22 May 2006, Robert Collins <robertc at robertcollins.net> wrote:
> This makes add faster again by using the inventory objects and by
> tweaking the elementree xml serialiser to be faster in the inner loop.
> 
> Rob
> -- 
> GPG key available at: <http://www.robertcollins.net/keys.txt>.

(btw, did you know about gpg --search-keys robert.collins@ ?)

+1 with some suggestions.

>  def smart_add(file_list, recurse=True, action=add_action_add):
> @@ -104,16 +111,25 @@
>      inv = tree.read_working_inventory()
>      added = []
>      ignored = {}
> -    user_files = set(file_list)
> -
> -    for filepath in file_list:
> -        # convert a random abs or cwd-relative path to tree relative.
> +    user_files = list(file_list)
> +    file_list = []
> +
> +    # validate user file paths and convert all paths to tree 
> +    # relative : its cheaper to make a tree relative path an abspath
> +    # than to convert an abspath to tree relative.
> +    for filepath in user_files:
>          rf = tree.relpath(filepath)
> -
> +        file_list.append((rf, None))
>          # validate user parameters. Our recursive code avoids adding new files
>          # that need such validation 
> -        if filepath in user_files and tree.is_control_filename(rf):
> +        if tree.is_control_filename(rf):
>              raise ForbiddenFileError('cannot add control file %s' % filepath)
> +
> +    user_files = set([path for path, parent_ie in file_list])

The way the variables are set up here seems a bit convoluted; at one
point file_list holds the user files and then later it holds all the
files, and user_files changes from being a list to a set.  Instead how
about this:

 - keep file_list as just the parameter that was originally passed
   (i think it's slightly better not to rebind parameters unless you
   need to)

 - then the original loop is 'for filepath in file_list'

 - make user_files an empty set up front and then add the tree-relative 
   path to as you walk file_list

 - build up (relpath, parent_ie) in, a new list, say files_to_traverse

> -def __add_one(tree, inv, path, kind, action):
> +def __add_one(tree, inv, parent_ie, path, kind, action):

Changing the parameters is a good opportunity to try to think of a
better name for the method, even for private methods where changing the
name isn't required.  (But I suppose this name is not so bad at the
moment.)

This doesn't seem to use the tree parameter so you might as well remove
it.

>      """Add a file or directory, automatically add unversioned parents."""

    """Add a new entry to the inventory and automatically add unversioned parents.

    Actual adding of the entry is delegated to the action callback.

    :param inv: Inventory which will receive the new entry.

    :param parent_ie: Parent inventory entry if known, or None.  If
    None, the parent is looked up by name and used if present, otherwise
    it is recursively added.

    :param kind: Kind of new entry (file, directory, etc)

    :param action: callback(inv, parent_ie, path, kind); return ignored.

    :returns: A list of paths which have been added.
    """

There might be scope for more improvement by having __add_one and the
callback optionally return the newly-added ie.

> === modified file 'bzrlib/xml_serializer.py'
> --- bzrlib/xml_serializer.py	
> +++ bzrlib/xml_serializer.py	
> @@ -71,3 +71,78 @@
>  
>      def _read_element(self, f):
>          return ElementTree().parse(f)
> +
> +

I guess it's fairly obvious but I'd like a comment up here to say that
these get patched into ElementTree when this module is loaded.

Do you think they're general enough to contribute to upstream
elementree?

> +# before in bench_add_kernel_like
> +# 10831        10824   9384.4890   1847.5270   elementtree.ElementTree:662(_write)
> +#+10824            0   9295.0140   1761.0460   +elementtree.ElementTree:662(_write)
> +#+32471            0   4585.8950   1331.2060   +elementtree.ElementTree:812(_escape_attrib)
> +#after switching to text.replace rather than string.replace.
> +# 10831        10824   7486.1120   1832.2340   elementtree.ElementTree:662(_write)
> +#+10824            0   7397.3120   1745.6300   +elementtree.ElementTree:662(_write)
> +#+32471            0   2762.3760   1300.3990   +bzrlib.xml_serializer:85(_escape_attrib)
> +
> + 
> +import elementtree.ElementTree
> +import re
> +escape_re = re.compile("&'\"<>")
> +escape_map = {
> +    "&":'&amp;',
> +    "'":"&apos;", # FIXME: overkill
> +    "\"":"&quot;",
> +    "<":"&lt;",
> +    ">":"&gt;",
> +    }


-- 
Martin




More information about the bazaar mailing list