[MERGE] Nested trees: CompositeTree

Ian Clatworthy ian.clatworthy at internode.on.net
Mon Apr 20 07:12:19 BST 2009


Aaron Bentley wrote:

> So if there's more work needed to get it to up to
> our normal standards, that's fine.

Well done. This new version is up to the quality I was looking for.
Let's land this baby.

>> I'd probably feel more comfortable if clear_cached_trees() didn't
>> touch the locking-related attributes, but resetting all of them would
>> be OK as well.
> 
> What if I just raised an AssertionError if self._lock_type was not None?

That would be good.

>>> +            if path == '':
>>> +                raise AssertionError('Subpath cannot be empty.')
>> You want subpath in this test, not path.
> 
> Thanks.

Still needs tweaking btw.

>>> +        subbranch = branch.reference_parent(file_id, path)
>>> +        try:
>>> +            subtree = self._get_tree_by_relpath(tree, path)
>>> +        except KeyError:
>>> +            subtree = tree.get_nested_tree(file_id, branch)
>> There's a lot of code (snipped) under that KeyError and I don't understand
>> what's it doing or why it's there. Please add some comments explaining that.
> 
> I've done so.  Lemme know if it's still unclear.

Much better. Thank-you.

>>> +    def get_file_text(self, file_id, path=None):
>>> +        """See Tree.get_file_text."""
>>> +        tree, relpath = self._find_tree_path(file_id, path)
>>> +        return tree.get_file_text(file_id, relpath)
>>> +
>> WorkingTree.get_file_text() now has a filtered=True optional parameter. I
>> suspect you need it as well for content filtering to work as expected
>> on nested trees.
> 
> It seems we don't have any operations that do content filtering yet.

We have eol support and keyword templating, the latter via a plugin.
The fix is simple: added filtered=True to the parameter list and
propagate the setting to the tree.get_file_text() call. Without
this, I promise you that eol conversion and keywords will not work
in nested trees.

Reviewing the code again, it now seems much cleaner. Part of that is
because I'm looking at it a second time, but I'm sure a large part of
it is because of the improvements you've made since the first review.

Here are some more tweaks ...

> +    def get_tree_and_treepath(self, file_id, skip_references=False):
> +        """Get a tree containing file_id and path relative to the root tree.
> +
> +        The path is the path to the subtree, not the file in the subtree.
> +
> +        :skip_references: if True, only results where the file_id is not
> +            a tree reference will be supplied.
> +        """
> +        try:
> +            tree, branch, path = self.id_tree_branch[file_id]
> +        except KeyError:
> +            for path, tree in self.all_trees().iteritems():
> +                if file_id in tree.inventory:
> +                    if skip_references:
> +                        try:
> +                            if tree.kind(file_id) == 'tree-reference':
> +                                continue
> +                        except errors.NoSuchFile:
> +                            pass
> +                    return tree, path
> +            else:
> +                raise errors.NoSuchId(self, file_id)
> +        else:
> +            return tree, path

I think you want to be caching the results in id_tree_branch after finding
them, i.e. just before the return in the except clause. Otherwise, when does
id_tree_branch ever get populated?

> +            if path == '':
> +                raise AssertionError('Subpath cannot be empty.')

As mentioned above, s/path/subpath/.

> +    def iter_pathinfo_by_tree(self, paths):
> +        """Iterate through the path information for provided paths.
> +
> +        iterates through (tree, tree_path, relpaths) tuples, where

Very minor - s/iterates/Iterates/ at the start of sentence.

> +    def all_trees(self):
> +        """Return a dict of composite_path to tree for all subtrees."""
> +        if not self._all_trees_scanned:
> +            self._scan_subtrees()
> +        self._all_trees_scanned = True

While not strictly necessary, better to put that last statement inside the
if block.

> +    def __init__(self, root_tree, root_branch):
> +        self._nested_trees = NestedTrees(root_tree, root_branch)
> +        self.inventory = _CompositeInventory(self._nested_trees)

I think it might be nice to make root_branch default to root_tree.branch
here. We could also do the same in NestedTree.__init__().

> +            pairs = []
> +
> +            changes = []
> +            for num, (from_path, from_relpath, entry, from_tree) in\
> +                enumerate(paths_info[:-1]):
> +                from_inv = from_tree.inventory

I'd prefer the blank line before pairs = [], not after it.
from_inv is never used so the last line can be deleted.

> +            if not after:
> +                trans_id_tree_path = {}

I'd like to see everything inside this if statement moved into a separate
method. move() is too large to be easily understandable and that looks as
good a place as any to partition it.

> +    def get_file_text(self, file_id, path=None):
> +        """See Tree.get_file_text."""
> +        tree, relpath = self._find_tree_path(file_id, path)
> +        return tree.get_file_text(file_id, relpath)

As mentioned above, still need to define and propagate a parameter called
filtered.

> +    def get_file_sha1(self, file_id, path=None, stat_value=None):
> +        """See Tree.get_file_sha1."""
> +        return self._nested_trees.get_tree(file_id).get_file_sha1(
> +            file_id, None, stat_value)
> +
> +    def get_file_mtime(self, file_id, path=None):
> +        """See Tree.get_file_mtime."""
> +        return self._nested_trees.get_tree(file_id).get_file_mtime(
> +            file_id, None)

Wouldn't it be better to use find_tree_path() in these two routines as
you do for is_executable?

> +        for old_path, new_path, file_id, new_entry in changes:
> +            if new_path is not None:
> +                if new_entry.kind == 'tree-reference':
> +                    raise ValueError('Cannot introduce a tree-reference in'
> +                         ' CompositeTree.apply_inventory_delta.')


IIUIC, this test will also stop changing a tree-reference, not just
introducing one. I suspect we want to allow changes. If so, you need
to move this test. If not, we need to reword the error to say
"Cannot introduce or change ...".

> +                new_root_id = new_tree.get_root_id()

Never used and can be deleted.

Moving on ...

> === added file 'bzrlib/tests/test_composite_tree.py'

> +class TestNestedTrees(tests.TestCaseWithTransport):
> +
> +    def make_branch_and_tree(self, path, format='dirstate-with-subtree'):

Any reason not to use a more recent format here, e.g. pack-0.92-subtree?

> +            fullpath, relpath, found_entry, found_tree = \
> +                nested.paths_info(['sub/subb'])[0]
> +        finally:
> +            nested.unlock()
> +        self.assertIs(None, found_entry)
> +        self.assertIsNot(None, found_tree)

I'd like a one-line comment above this bit explaining that you're now
testing a missing path. When I first looked at it, I thought it was
a redundant copy of the previous line a few lines above it. Also,
I'd prefer the 2 asserts to come directly after the call to path_info,
not after the unlock.

> +    def test_move_after(self):
> +        tree = self.make_branch_and_tree('.', format='dirstate-with-subtree')
> +        self.build_tree(['file'])
> +        tree.add(['file'], ['file-id'])
> +        os.unlink('file')
> +        nested = composite_tree.NestedTrees(tree, tree.branch)
> +        nested.lock_read()
> +        try:
> +            full_path, relpath, entry, found_tree = \
> +                nested.paths_info('file')[0]
> +        finally:
> +            nested.unlock()

This test looks like junk that ought to go, given "move" is now
implemented at the CompositeTree level, not the NestedTree level.
If you were planning to test something else, the format parameter to
make_branch_and_tree can go and you need some asserts in there
somewhere. :-)

While I'm at it, I'd like a follow-on patch please covering some,
but not all, of the public methods in NestedTree that don't have
unit tests yet. This patch is already big enough to make them tweaks!
To be explicit, I'd like to see these things tested:

* get_tree_and_treepath() - the tree_references param in particular
* iter_pathinfo_by_tree() - the ordering issue in particular
* get_tree_branch() - the locking logic in particular.

> +    def test_iter_entries_by_dir(self):

> +    def test_iter_entries_by_dir_follow_subtree(self):

> +    def test_inventory(self):

> +    def test_comparison_data(self):

All of these tests can be simplified by using get_locked_composite().

> +    def test_apply_inventory_delta(self):

This is a large test and I suspect reviewers like Vincent would
ask for it to be broken up into smaller tests. I'm less of a
TDD guru and I'm happy for one test to cover multiple things and
for it to leverage evolving state as you're doing. However, I do
like to see whitespace and one-line comments used to break it up
logically with a brief explanation about what each batch of
statements is testing. Please.

So great job on this overall. I'm looking forward to seeing the
various patches come together! I'm on leave for the rest of this
week so I'm really hoping others step up and review some of those
other nested tree patches.

Ian C.



More information about the bazaar mailing list