[MERGE] Remove the basis_tree parameter to record_iter_changes.

Thu Mar 19 21:50:30 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> I'm reasonably happy with this now, its probably not perfect, and it
> needs profiling in a CHK environment, but I'd at least like it reviewed
> now. (Updated to remove an unused parameter).

You sent 2 messages within 30min with the same text but different
subjects. This one is slightly newer and slightly bigger, so I assume it
is the one you want reviewed.

=== modified file 'bzrlib/inventory.py'
- --- bzrlib/inventory.py	2009-03-12 08:01:41 +0000
+++ bzrlib/inventory.py	2009-03-13 02:25:46 +0000
@@ -996,6 +996,24 @@
                         child_dirs.append((child_relpath+'/', child_ie))
             stack.extend(reversed(child_dirs))

+    def make_delta(self, old):


^- this looks like almost identical to the the "Inventory._make_delta"
function, only it doesn't look at self._byid, which means it has to go
through Inventory.__iter__.

I'm guessing this code pre-dates the _make_delta() code (coming in from
different code paths).

I would rather you just remove this one, and use the one we already have.

=== modified file 'bzrlib/repository.py'
...
+    def record_iter_changes(self, tree, basis_revision_id, iter_changes,
+        _entry_factory=entry_factory):

^- My understanding here is that this 'iter_changes' is going to be
something like WT.iter_changes(WT.basis_tree(), specific_files=XXX) etc,
right?

...

+        if len(self.parents) > 1:
+            if basis_revision_id != self.parents[0]:
+                raise Exception(
+                    "arbitrary basis parents not yet supported with
merges")

^- I feel like in the short-to-medium term, we should have this check on
at all times, whether there is more than one parent or not.


v- slightly incorrect spacing here
+        # Setup the changes from the tree:
+        changes= {}
+        for change in iter_changes:

v- This comment doesn't really seem correct, given that it only looks at
items that are marked 'changed'.

Also, whenever dealing with the output from 'iter_changes', it is
helpful to add code comments like:
# change[1][0] is the path in the basis inventory, which if None, means
# means the content did not exist.
# change[0] is the file_id

+            # This probably looks up in basis_inv way to much.
+            if change[1][0] is not None:
+                head_candidate = [basis_inv[change[0]].revision]
+            else:
+                head_candidate = []
+            changes[change[0]] = change, merged_ids.get(change[0],
+                head_candidate)

... coming back to the more-than-one-parent case:

+        if len(self.parents) > 1:
+            if basis_revision_id != self.parents[0]:
+                raise Exception(
+                    "arbitrary basis parents not yet supported with
merges")
+            repo_basis = revtrees[0]
+            for revtree in revtrees[1:]:
+                for change in revtree.inventory.make_delta(basis_inv):

^- another comment here for:
# change: (old-path, new-path, file_id, new_ie)

+                    if change[1] is None:
+                        # Deleted

^- not exactly accurate, it just means the file_id isn't present in this
parent. It may be deleted, or it may just be never added.
  # not present in this parent
Is probably a better comment.

+                        continue
+                    if change[2] not in merged_ids:
+                        if change[0] is not None:
+                            merged_ids[change[2]] = [
+                                repo_basis.inventory[change[2]].revision,
+                                change[3].revision]

^- Calling the object "repo_basis" is a bit odd, since it is a
"revision_tree". How about "basis_tree" instead?
Also, since we already unpacked "basis_inv = revtrees[0]" it seems best
to just use that object, rather than repo_basis.inventory[].


+                        else:
+                            merged_ids[change[2]] = [change[3].revision]
+                        parent_entries[change[2]] =
{change[3].revision:change[3]}

...
+            # NB:XXX: We are reconstructing path information we had, this
+            # should be preserved instead.

^- It seems like you could trivially do this by changing the
'parent_entries' dict, to instead be:
parent_entries[file_id] = {revision:(ie, path)}



+        unchanged_merged = set(merged_ids) - set(changes)
+        # Extend the changes dict with synthetic changes to record
merges of
+        # texts.

^- I wonder if this wouldn't be better as:
unchanged_merged = set(merged_ids).difference(changes)

Especially for the case when there is only 1 parent. Because merged_ids
will then be empty, and can be a trivial no-op to return an empty set to
'unchanged_merged'. The way you have it written, we have to build up a
set for all changes (which may be a lot for say the first-commit), even
though we don't care about it.

...

+            basis_entry = repo_basis.inventory[file_id]
+            change = (file_id,
+                (basis_inv.id2path(file_id), tree.id2path(file_id)),
+                False, (True, True),
+                (basis_entry.parent_id, basis_entry.parent_id),
+                (basis_entry.name, basis_entry.name),
+                (basis_entry.kind, basis_entry.kind),
+                (basis_entry.executable, basis_entry.executable))
+            changes[file_id] = (change, merged_ids[file_id])
+        # changes contains tuples with the change and a set of inventory
+        # candidates for the file.

^- maybe make it clearer that these are candidate revisions, and not
inventory objects themselves?

# changes maps {file_id: (change, [parent_revision_ids])}

I'm not sure if that is better or not..


+        # inv delta is:
+        # old_path, new_path, file_id, new_inventory_entry
+        seen_root = False # Is the root in the basis delta?
+        inv_delta = self._basis_delta
+        modified_rev = self._new_revision_id
+        for change, head_candidates in changes.values():
+            if change[3][1]: # versioned in target.
+                # Several things may be happening here:
+                # We may have a fork in the per-file graph
+                #  - record a change with the content from tree
+                # We may have a change against < all trees
+                #  - carry over the tree that hasn't changed
+                # We may have a change against all trees
+                #  - record the change with the content from tree
+                kind = change[6][1]
+                file_id = change[0]
+                entry = _entry_factory[kind](file_id, change[5][1],
+                    change[4][1])
+                head_set = self._heads(change[0], set(head_candidates))
+                heads = []
+                # Preserve ordering.
+                for head_candidate in head_candidates:
+                    if head_candidate in head_set:
+                        heads.append(head_candidate)
+                        head_set.remove(head_candidate)
+                carried_over = False
+                if len(heads) == 1 and len(head_candidates) > 1:

^- I'm not sure if the 'len(head_candidates) > 1' is valid to restrict
this from being a carry-over. Consider the case where a file is added in
a branch, and that addition is then merged. Doesn't that mean that
"len(head_candidates)" will be 1 (the file didn't exist in the basis),
but that carry-over is valid (because it is exactly the same content).

I *think* this check should just be "if len(heads) == 1:" which is the
check we have now.

+                if kind == 'file':
+                    # XXX: There is still a small race here: If someone
reverts the content of a file
+                    # after iter_changes examines and decides it has
changed,
+                    # we will unconditionally record a new version even
if some
+                    # other process reverts it while commit is running
(with
+                    # the revert happening after iter_changes did it's
+                    # examination).
+                    if change[7][1]:
+                        entry.executable = True
+                    else:
+                        entry.executable = False
+                    if (carry_over_possible and
+                        parent_entry.executable == entry.executable):
+                            # Check the file length, content hash after
reading
+                            # the file.
+                            nostore_sha = parent_entry.text_sha1
+                    else:
+                        nostore_sha = None

^- you take the time to skip "nostore_sha" in the case that
entry.executable changed, but why not also set it in the case that the
entry.size has changed?

Also, if you are concerned about the race condition, shouldn't you
potentially set "entry.executable" based on the 'file_obj, stat_value'
you get back from 'get_file_with_stat' ?
(I realize you end up needing to be careful about win32 and executable
bits, etc. But it seems like it might be less 'racy').

...

+                elif kind == 'tree-reference':
+                    raise AssertionError('unknown kind %r' % kind)

^- I'm not 100% sure what needs to be done here, but since we want to
have 'tree-references' mostly supported in brisbane-core formats, it
seems like we need to do something here.

...

+    def test_abort_record_iter_changes(self):
+        tree = self.make_branch_and_tree(".")
+        tree.lock_write()
+        try:
+            builder = tree.branch.get_commit_builder([])
+            try:
+                basis = tree.basis_tree()
+                last_rev = tree.last_revision()
+                changes = tree.iter_changes(basis)
+                builder.record_iter_changes(tree, last_rev, changes)
+                builder.finish_inventory()
+            finally:
+                builder.abort()
+        finally:
+            tree.unlock()
+

^- shouldn't this test some sort of side-effect after 'builder.abort()'
? {like nothing new committed, etc.) Or is it just testing that abort
exists and doesn't raise an exception?


I didn't look as closely at the test cases.

...

+    def test_last_modified_revision_after_rename_link_changes_ric(self):
+        # renaming a link changes the last modified.
+        self.requireFeature(tests.SymlinkFeature)
+        tree = self.make_branch_and_tree('.')
+        os.symlink('target', 'link')
+        self._add_commit_renamed_check_changed(tree, 'link',
+            mini_commit=self.mini_commit_record_iter_changes)
+

^- This certainly smacks of using a "multiply_tests" function, rather
than lots of test_foo..._ric().

Or even just simple inheritance and override self.mini_commit. (I know
how much you like inheritance based test
adaptation/multiplication/parameterization :)


BB:tweak

John
=:->


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknCviYACgkQJdeBCYSNAAPA8gCeNnqp32vszPnY+23rIDcUt8kH
bPwAn39Gq8i1if+ohiWlJ7AE/OYGObLA
=fYwS
-----END PGP SIGNATURE-----