[MERGE] Nested trees: CompositeTree

Aaron Bentley aaron at aaronbentley.com
Wed Apr 15 20:49:59 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thanks for your continuing work on this review.

Here's an updated patch that should address your concerns.  It also
removes the _subtrees_by_compositepath, which simplifies the code somewhat.

Ian Clatworthy wrote:
> Aaron Bentley wrote:
> Also, code quality *feels* a little below the standard I usually
> see in your patches.

Not really surprising, as this was old code with multiple authors, and
hadn't been submitted for merging even when it was new.  My goal was to
get it up to snuff and submit it.  I was also trying to minimize the
size of the patch.  So if there's more work needed to get it to up to
our normal standards, that's fine.

> To be more specific, the code needs more
> explanatory comments in exception cases and docstrings need
> some attention. In particular, I'd like to see "path" parameters
> clearly documented w.r.t. absolute vs relative

I don't think there are any absolute paths in the API.  The API is
general enough to handle RevisionTrees, and they don't have any absolute
paths.

> and when relative,
> whether they are relative to the top of the base tree or relative
> to the subtree.

I have tried to update it accordingly.

>> +    def __init__(self, root_tree, root_branch):
>> +        self.root_tree = root_tree
>> +        self.root_branch = root_branch
>> +        self._clear_cached_trees()
>> +        self.lock_count = 0
>> +        self.lock_type = None
>> +
>> +    def _clear_cached_trees(self):
>> +        self._subtrees_by_relpath = {}
>> +        self._subtrees_by_compositepath = {}
>> +        self._register_tree_compositepath(self.root_tree, '')
>> +        self._all_trees_scanned = False
>> +        self.id_tree_branch = {}
>> +        self._locked_trees = []
> 
> Is there a good reason for lock_count & lock_type to be set/reset in
> one place while _locked_trees is set/reset inside _clear_cached_trees()?

We can't uncache any trees while we're locked, because if there's a
write lock, we would hold on to that lock, and then re-open and re-lock
the tree, causing lock failures.  (And we certainly can't unlock any
trees while we're locked.)

The list of locked trees is invalidated by the fact that the cached
trees have been cleared, not by the unlocking of the tree.  If we didn't
call _clear_cached_trees and reacquired the lock instead, we could just
re-lock all the entries in _locked_trees.  This works only because the
cache ensures we would use those same re-locked trees when calling

Of course, that's an invitation to races-- once the trees are unlocked,
the subtree paths can be changed by an external process.  That's why we
drop all the cached tree data-- because it may no longer be correct once
we unlock.

So there is a tight association between unlocking the NestedTrees and
clearing its caches.  Unlocking requires dropping cached data (to
prevent races), and dropping cached data requires being unlocked (to
prevent locking two instances of the same tree).  But there is a
stronger association between locked_trees and the other cached data than
there is between locked_trees and lock_type/lock_count.

> I'd probably feel more comfortable if clear_cached_trees() didn't
> touch the locking-related attributes, but resetting all of them would
> be OK as well.

What if I just raised an AssertionError if self._lock_type was not None?

>> +    def paths_info(self, paths):
>> +        """Return path, relpath, inventory entry, and tree for input paths
>> +
>> +        :return: a list of (path, relpath, entry, tree) tuples
>> +        """
>> +        self._must_be_locked()
>> +        return [self._path_info(p) for p in paths]
>> +
>> +    def _path_info(self, path):
>> +        """Find the relpath, tree and inventory entry for a path."""
> 
> These are examples of docstrings which should be clearer IMO. Which paths
> are relative and relative to what exactly? entry can actually be returned
> as None but when that happens isn't documented and ought to be. The ordering
> implied by the second docstring is wrong as well.

Updated.

>> +            if node is None:
>> +                return path, osutils.pathjoin(*relpath), None, tree
>> +            try:
>> +                kind = tree.kind(node.file_id)
>> +            except errors.NoSuchFile:
>> +                pass
> 
> I'm probably missing something obvious here but I don't see how
> NoSuchFile can be raised here.

You are "missing" the possibility of missing files :-)  We are
traversing the inventory, starting from tree_root.inventory.root.  So
some of the file-ids we encounter may correspond with missing files.

> If it is, why is it OK to ignore it?

I think it's actually not okay, just close enough that we haven't
noticed the problem.  If we ignore a missing directory or subtree, we
don't change the node, and we look for the children of the missing
directory/subtree in its parent (or grandparent, etc.)  If there happens
to be a child in the parent with that name, we can wind up attempting to
recurse into the wrong tree.  Otherwise, that child comes back None and
we return.

I'll add tests for this case, and fix it if I'm right.

>> +    def get_path_tree(self, path):
>> +        """Determine the tree containing a path and the tree's pathname."""
> 
> Some of the method names feel overly short. In particular, get_tree_path()
> and get_path_tree() could be confused and don't return simple values like I
> first expected they would on seeing their names. I don't have great
> alternatives but as a starting point, maybe something like:
> 
> * get_tree_path() -> get_tree_and_treepath()

Okay.

> * get_path_tree() -> get_pathinfo_for_tree_containing()

Your proposed name is misleading, because it suggests only information
about the tree is being returned.  In fact, it is information about the
supplied path, since it includes the relpath within that tree.

I suggest calling it get_pathinfo().

> 
>> +            if path == '':
>> +                raise AssertionError('Subpath cannot be empty.')
> 
> You want subpath in this test, not path.

Thanks.

> 
>> +    def iter_paths_by_tree(self, paths):
>> +        tree_path = None
> 
> Needs a docstring.

Done

> 
>> +    def _scan_subtrees(self):
>> +        self._must_be_locked()
>> +        def do_scan(prefix, relpath, tree, containing_tree, branch):
>> +            self._register_both(tree, prefix, containing_tree, relpath)
>> +            for path, file_id in tree.iter_references():
>> +                tree_path = tree.id2path(file_id)
> 
> Can tree ever be None in these last 2 lines?
> 
>> +                do_scan(composite_path, tree_path, subtree, tree, subbranch)
>> +        do_scan('', '', self.root_tree,  None, self.root_branch)

No, I don't see a way tree could ever be None.  It starts of as
self.root_tree, and then recurses into the tree references via
get_tree_branch, which doesn't return None AFAICT.

> One too many spaces before None.

Fixed.

> 
>> +    def _register_both(self, tree, compositepath, containing_tree, relpath):
>> +        self._register_tree_compositepath(tree, compositepath)
>> +        # special case root
>> +        if tree is not None:
>> +            self._register_tree_relpath(tree, containing_tree, relpath)
>> +
> 
> Why is it OK to register_tree_compositepath here if tree is None?

I've nuked this entire function.

>> +        subbranch = branch.reference_parent(file_id, path)
>> +        try:
>> +            subtree = self._get_tree_by_relpath(tree, path)
>> +        except KeyError:
>> +            subtree = tree.get_nested_tree(file_id, branch)
> 
> There's a lot of code (snipped) under that KeyError and I don't understand
> what's it doing or why it's there. Please add some comments explaining that.

I've done so.  Lemme know if it's still unclear.

>> +class CompositeInventory(object):
>> +    """An inventory that combines the inventories of several subtrees.
>> +
>> +    Intended for use as CompositeTree.inventory, not for direct instantiation.
>> +    """
> 
> This class isn't tested directly and ought to be private. Otherwise, I'm yet
> to review it.

Made private.

> 
>> +class CompositeTree(object):
>> +    """Implements parts of Tree interface in terms of subtrees.
>> +
>> +    Essentially, this is a by-value representation of a set of nested
>> +    subtrees.
>> +    Care should be taken in its use-- not suitable for every problem.
>> +    """
> 
> Can you elaborate here as to which problems it is suited for vs the
> ones where it is not? (A link to an email thread or IRC chat would be
> better than nothing.)

There are two issues.
1. Some operations need to know where the subtrees are.  Commit and
pull, for example, need to commit to each subtree.

2. This is not a complete implementation of the Tree interface.  If
there is any question whether a given operation uses unimplemented
functionality, the operation should be tested against it.

> 
>> +    def get_file_text(self, file_id, path=None):
>> +        """See Tree.get_file_text."""
>> +        tree, relpath = self._find_tree_path(file_id, path)
>> +        return tree.get_file_text(file_id, relpath)
>> +
> 
> WorkingTree.get_file_text() now has a filtered=True optional parameter. I
> suspect you need it as well for content filtering to work as expected
> on nested trees.

It seems we don't have any operations that do content filtering yet.

>> +    def path2id(self, path):
>> +        """See Tree.path2id."""
>> +        # FIXME -- shouldn't return missing files
>> +        return self.inventory.path2id(path)
> 
> Is this FIXME still relevant? I couldn't see why it was there.

Yes.  The contract of Tree.path2id is that it returns None for missing
files.  The concept of 'missing files' can't be formulated in the
context of Inventory.path2id, so it always returns a file-id if it knows
one.

>> +    def conflicts(self):
>> +        """See WorkingTree.conflicts."""
>> +        # FIXME adjust paths in conflicts
>> +        conflicts = []
>> +        for prefix, tree in self._nested_trees.all_trees().iteritems():
>> +            conflicts.extend(tree.conflicts())
>> +        return _mod_conflicts.ConflictList(conflicts)
> 
> It looks like this FIXME is still required. It might be worth adding
> a test for this.

Done.

> 
>> +    def kind(self, file_id):
>> +        """See Tree.kind"""
>> +        # Doing a lookup on file_id twice? Hmm.
>> +        return self._nested_trees.get_tree_path(file_id)[0].kind(file_id)
> 
> I think get_tree() is better than get_tree_path()[0] here.
> The comment can go as well?

Sure.  The double-lookups are a fundamental property of CompositeTrees.

>> +    def filter_unversioned_files(self, paths):
>> +        """See Tree.filter_unversioned_files."""
>> +        unversioned = set()
>> +        # Reverse sort should guarantee that all subtree paths are processed
>> +        # before tree paths.  So if path.startswith(tree_path), we're in
>> +        # the same tree.
>> +        for tree, tree_path, subpaths in \
>> +            self._nested_trees.iter_paths_by_tree(paths):
> 
> That comment ought to be moved into the iter_paths_by_tree() method
> I suspect.

Yep, thanks.

> 
>> +    def _comparison_data(self, entry, path):
>> +        tree, prefix, path = self._nested_trees.get_path_tree(path)
>> +        kind, executable, stat = tree._comparison_data(entry, path)
>> +        if kind == 'tree-reference':
>> +            kind = 'directory'
> 
> When can that last if test succeed? (I couldn't think of a case.)

Any case where path is a tree reference and get_path_tree returns the
tree containing the tree reference.

> 
>> +    def get_file_sha1(self, file_id, path=None, stat_value=None):
>> +        """See Tree.get_file_sha1."""
>> +        return self._nested_trees.get_tree(file_id).get_file_sha1(
>> +            file_id, None, stat_value)
>> +
>> +    def get_file_mtime(self, file_id, path=None):
>> +        """See Tree.get_file_mtime."""
>> +        return self._nested_trees.get_tree(file_id).get_file_mtime(
>> +            file_id, None)
> 
> John has been promoting a policy of explicitly naming optional parameters
> rather than relying on their position being unchanged.

I don't think I've ever heard that suggested before.  Is it not an API
break to change the position of any parameter?  If not, then shouldn't
we specify the names of all parameters, not just optional ones?

> I think you should
> follow that convention in these methods.
> 
>> +    def id2path(self, file_id):
>> +        """See Tree.id2path."""
>> +        return self._nested_trees.get_tree_path(file_id)[1]
> 
> This is wrong isn't it? get_tree_path()[1] is the path of the tree
> when we want to return the path of the file-id, perhaps within a tree, here.

Fixed.

> 
>> +    def apply_inventory_delta(self, changes):
>> +        """Apply an inventory delta to a set of subtrees.
> 
> Do we want/need to use lock decorators on methods like this?
> If not, we probably ought to explain why in the class docstring.

We generally don't need to use lock decorators if we're calling out to
methods which, themselves, have lock decorators.

>> +                if new_tree_path != '':
>> +                    new_path = new_path[len(new_tree_path)+1:]
>> +                new_root_id = new_tree.get_root_id()
>> +            else:
>> +                new_tree_path = None
>> +            if old_path is not None:
>> +                old_tree, old_tree_path = \
>> +                    self._nested_trees.get_tree_path(file_id)
>> +                if old_tree_path != '':
>> +                    old_path = old_path[len(old_tree_path) + 1:]
> 
> Very minor grumble - "+1:]" vs " + 1:]". I'm OK either way but let's
> be consistent.

Fixed.

>> +        for delta, tree in deltas.itervalues():
>> +            tree.apply_inventory_delta(delta)
> 
> While a little tricky, I couldn't see anything wrong with the
> apply_inventory_delta() code. But it does scare me a bit as to
> what edge cases it might not be handling. For example, I wonder
> whether a kind change from directory to tree-reference or vice-versa
> would be handled correctly, where the pre and post set of
> tree-references changes. Maybe we just need to be extra diligent
> in ensuring we have enough unit tests for stuff like that?

Okay.  I've added a test case for introducing tree references.  Since
CompositeTree won't output tree references, it seems consistent to
refuse to introduce tree references.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknmOmYACgkQ0F+nu1YWqI0SjgCcCILE75w/RP58t3yabHrsGOcS
PRIAn2UaFX1n7TQWW32KOWlPQidUTgQj
=AI7d
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: composite2.patch
Type: text/x-patch
Size: 170354 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090415/10804fa1/attachment-0001.bin 


More information about the bazaar mailing list