[MERGE] add performance three
Martin Pool
mbp at canonical.com
Mon May 22 05:55:29 BST 2006
On 22 May 2006, Robert Collins <robertc at robertcollins.net> wrote:
> This makes add faster again by using the inventory objects and by
> tweaking the elementree xml serialiser to be faster in the inner loop.
>
> Rob
> --
> GPG key available at: <http://www.robertcollins.net/keys.txt>.
(btw, did you know about gpg --search-keys robert.collins@ ?)
+1 with some suggestions.
> def smart_add(file_list, recurse=True, action=add_action_add):
> @@ -104,16 +111,25 @@
> inv = tree.read_working_inventory()
> added = []
> ignored = {}
> - user_files = set(file_list)
> -
> - for filepath in file_list:
> - # convert a random abs or cwd-relative path to tree relative.
> + user_files = list(file_list)
> + file_list = []
> +
> + # validate user file paths and convert all paths to tree
> + # relative : its cheaper to make a tree relative path an abspath
> + # than to convert an abspath to tree relative.
> + for filepath in user_files:
> rf = tree.relpath(filepath)
> -
> + file_list.append((rf, None))
> # validate user parameters. Our recursive code avoids adding new files
> # that need such validation
> - if filepath in user_files and tree.is_control_filename(rf):
> + if tree.is_control_filename(rf):
> raise ForbiddenFileError('cannot add control file %s' % filepath)
> +
> + user_files = set([path for path, parent_ie in file_list])
The way the variables are set up here seems a bit convoluted; at one
point file_list holds the user files and then later it holds all the
files, and user_files changes from being a list to a set. Instead how
about this:
- keep file_list as just the parameter that was originally passed
(i think it's slightly better not to rebind parameters unless you
need to)
- then the original loop is 'for filepath in file_list'
- make user_files an empty set up front and then add the tree-relative
path to as you walk file_list
- build up (relpath, parent_ie) in, a new list, say files_to_traverse
> -def __add_one(tree, inv, path, kind, action):
> +def __add_one(tree, inv, parent_ie, path, kind, action):
Changing the parameters is a good opportunity to try to think of a
better name for the method, even for private methods where changing the
name isn't required. (But I suppose this name is not so bad at the
moment.)
This doesn't seem to use the tree parameter so you might as well remove
it.
> """Add a file or directory, automatically add unversioned parents."""
"""Add a new entry to the inventory and automatically add unversioned parents.
Actual adding of the entry is delegated to the action callback.
:param inv: Inventory which will receive the new entry.
:param parent_ie: Parent inventory entry if known, or None. If
None, the parent is looked up by name and used if present, otherwise
it is recursively added.
:param kind: Kind of new entry (file, directory, etc)
:param action: callback(inv, parent_ie, path, kind); return ignored.
:returns: A list of paths which have been added.
"""
There might be scope for more improvement by having __add_one and the
callback optionally return the newly-added ie.
> === modified file 'bzrlib/xml_serializer.py'
> --- bzrlib/xml_serializer.py
> +++ bzrlib/xml_serializer.py
> @@ -71,3 +71,78 @@
>
> def _read_element(self, f):
> return ElementTree().parse(f)
> +
> +
I guess it's fairly obvious but I'd like a comment up here to say that
these get patched into ElementTree when this module is loaded.
Do you think they're general enough to contribute to upstream
elementree?
> +# before in bench_add_kernel_like
> +# 10831 10824 9384.4890 1847.5270 elementtree.ElementTree:662(_write)
> +#+10824 0 9295.0140 1761.0460 +elementtree.ElementTree:662(_write)
> +#+32471 0 4585.8950 1331.2060 +elementtree.ElementTree:812(_escape_attrib)
> +#after switching to text.replace rather than string.replace.
> +# 10831 10824 7486.1120 1832.2340 elementtree.ElementTree:662(_write)
> +#+10824 0 7397.3120 1745.6300 +elementtree.ElementTree:662(_write)
> +#+32471 0 2762.3760 1300.3990 +bzrlib.xml_serializer:85(_escape_attrib)
> +
> +
> +import elementtree.ElementTree
> +import re
> +escape_re = re.compile("&'\"<>")
> +escape_map = {
> + "&":'&',
> + "'":"'", # FIXME: overkill
> + "\"":""",
> + "<":"<",
> + ">":">",
> + }
--
Martin
More information about the bazaar
mailing list