[MERGE] add performance tweaks.

Robert Collins robertc at robertcollins.net
Tue Jun 13 12:59:16 BST 2006


On Mon, 2006-06-12 at 15:55 -0400, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> This looks basically good, but I have some questions.
> 
> Robert Collins wrote:
> > My replacement xml serialiser halves the serialisation cost, but is not
> > compatible with 0.8 [unless the bug 49224 bugfix is applied], and I've
> > not yet profiled the cost of doing entity replacement to make a
> > ascii-safe bytestream.
> 
> This sounds like something we should do before deciding whether we need
> to bugfix 0.8.  In theory, doing entity replacement instead of utf-8
> encoding shouldn't be a big loss-- might even be faster.

Well, 0.8 is currently double-handling the data, so I think fixing it is
good either way :). That said, I agree we should benchmark the various
approaches.

> > +    def test_make_10824_inv_entries(self):
> > +        """Making 10824 inv entries should be quick."""
> > +        entries = []
> > +        def make_10824_entries():
> > +            for counter in xrange(10000):
> > +                bzrlib.inventory.make_entry('file', 'foo',
> > +                    bzrlib.inventory.ROOT_ID)
> 
> This set off my unique_root_id alarm.  Even though it's okay, I'd prefer
> using 'bar' instead of bzrlib.inventory.ROOT_ID.  We should discourage
> use of ROOT_ID.

Sure, I'll change that to 'somerandomparentid'.

> >      prepared_list = _prepare_file_list(file_list)
> >      mutter("smart add of %r, originally %r", prepared_list, file_list)
> >      inv = tree.read_working_inventory()
> >      added = []
> >      ignored = {}
> > -    user_files = set()
> > -    files_to_add = []
> > +    dirs_to_add = []
> > +    user_dirs = set()
> 
> Where are you distinguishing between user-specified files and files
> discovered by listing directories?  I can't see anywhere where you're
> raising BadFileKind.

There are now two loops:
the first one checks for this with user files:
line 159 and on:
        abspath = tree.abspath(rf.raw_path)
        kind = bzrlib.osutils.file_kind(abspath)
        if kind == 'directory':
            # schedule the dir for scanning
            user_dirs.add(rf.raw_path)
        else:
            if not InventoryEntry.versionable_kind(kind):
                raise BadFileKindError("cannot add %s of type %s" %
(abspath, kind))

The second loop just ignores unversionable types as it no longer needs
to throw.

> > +        added = __add_one_and_parent(tree, inv, None, FastPath(dirname(path.raw_path)), 'directory', action)
> > +        parent_id = inv.path2id(dirname(path.raw_path))
> > +        if parent_id is None:
> > +            parent_ie = inv[inv.path2id(dirname(path.raw_path))]
> 
> Huh? If inv.path2id(dirname(path.raw_path)) returns None, why do you do
> it again?

Nice catch. Actually I was enforcing that parent_id would never be None,
so it should just be a straight lookup - and thats why no tests were
failing. Changed.

> >  class RootEntry(InventoryEntry):
> >  
> > +    __slots__ = ['text_sha1', 'text_size', 'file_id', 'name', 'kind',
> > +                 'text_id', 'parent_id', 'children', 'executable', 
> > +                 'revision', 'symlink_target']
> > +
> >      def _check(self, checker, rev_id, tree):
> >          """See InventoryEntry._check"""
> 
> Drat.  This will be a conflict for the nested-trees work.  Necessary, I
> guess.

Yah, sorry.

> >  def file_kind(f, _lstat=os.lstat, _mapper=file_kind_from_stat_mode):
> > +    try:
> > +        return _mapper(_lstat(f).st_mode)
> > +    except OSError, e:
> > +        if getattr(e, 'errno', None) == errno.ENOENT:
> > +            raise bzrlib.errors.NoSuchFile(f)
> > +        raise
> > +
> >      return _mapper(_lstat(f).st_mode)
> 
> The last line looks like it will never execute.

Another good catch. I suspect I was going wall-eyed.

> > -def walkdirs(top):
> > +def walkdirs(top, prefix=""):
> >      """Yield data about all the directories in a tree.
> >      
> >      This yields all the data about the contents of a directory at a time.
> > @@ -813,13 +820,14 @@
> >      The data yielded is of the form:
> >      [(relpath, basename, kind, lstat, path_from_top), ...]
> >  
> > +    :param prefix: Treat prefix as the initial root of the tree, rather than ""
> >      :return: an iterator over the dirs.
> >      """
> 
> I don't understand the difference between the top and the prefix.  Why
> would I specify the top and then specify the prefix?  Is it so that the
> path is still relative to top?

Yes, How about this docstring fragment:

:param prefix: Prefix the relpaths that are yielded with 'prefix'. This 
  allows one to walk a subtree but get paths that are relative to a tree
  rooted higher up.

> > === modified file 'bzrlib/tests/test_smart_add.py'
> > --- bzrlib/tests/test_smart_add.py	2006-06-06 12:06:20 +0000
> > +++ bzrlib/tests/test_smart_add.py	2006-06-11 12:04:22 +0000
> > @@ -59,7 +59,7 @@
> >          wt = self.make_branch_and_tree('.')
> >          branch = wt.branch
> >          child_tree = self.make_branch_and_tree('original/child')
> > -        smart_add_tree(wt, (u".",))
> > +        smart_add_tree(wt, (".",))
> 
> What motivates this?

Removing bias in the tests - in our thread some months back we decided
that all apis which need unicode should use safe_unicode to upcast,
rather than forcing users of the api to know that they must pass in
unicode to get the right result. So our tests should always pass in
ascii strings unless we are testing the acceptance of unicode paths, so
that we uncover places this is not done on platforms that need it.

Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060613/036e5c13/attachment.pgp 


More information about the bazaar mailing list