[MERGE] Remove the basis_tree parameter to record_iter_changes.

Fri Mar 20 06:32:47 GMT 2009

On Thu, 2009-03-19 at 16:50 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> > I'm reasonably happy with this now, its probably not perfect, and it
> > needs profiling in a CHK environment, but I'd at least like it reviewed
> > now. (Updated to remove an unused parameter).
> 
> You sent 2 messages within 30min with the same text but different
> subjects. This one is slightly newer and slightly bigger, so I assume it
> is the one you want reviewed.

It is (thus the 'updated' commend in the body). Sorry for the confusion.

> I'm guessing this code pre-dates the _make_delta() code (coming in from
> different code paths).
> 
> I would rather you just remove this one, and use the one we already have.

Sure, I missed that this had fallen in.

> === modified file 'bzrlib/repository.py'
> ...
> +    def record_iter_changes(self, tree, basis_revision_id, iter_changes,
> +        _entry_factory=entry_factory):
> 
> ^- My understanding here is that this 'iter_changes' is going to be
> something like WT.iter_changes(WT.basis_tree(), specific_files=XXX) etc,
> right?

yes.

> ...
> 
> +        if len(self.parents) > 1:
> +            if basis_revision_id != self.parents[0]:
> +                raise Exception(
> +                    "arbitrary basis parents not yet supported with
> merges")
> 
> ^- I feel like in the short-to-medium term, we should have this check on
> at all times, whether there is more than one parent or not.

Yes, the check is wrong, fixing.

> v- slightly incorrect spacing here
> +        # Setup the changes from the tree:
> +        changes= {}
> +        for change in iter_changes:
> 
> v- This comment doesn't really seem correct, given that it only looks at
> items that are marked 'changed'.

I'm not sure what you mean about correctness.

> Also, whenever dealing with the output from 'iter_changes', it is
> helpful to add code comments like:
> # change[1][0] is the path in the basis inventory, which if None, means
> # means the content did not exist.
> # change[0] is the file_id

Sure, I can do that. It is rather redundant though :(. I certainly won't
claim to remember to do this all the time.

> ... coming back to the more-than-one-parent case:
> 
> +        if len(self.parents) > 1:
> +            if basis_revision_id != self.parents[0]:
> +                raise Exception(
> +                    "arbitrary basis parents not yet supported with
> merges")
> +            repo_basis = revtrees[0]
> +            for revtree in revtrees[1:]:
> +                for change in revtree.inventory.make_delta(basis_inv):
> 
> ^- another comment here for:
> # change: (old-path, new-path, file_id, new_ie)

Actually, this is a terrible smell. Whats the point of docstrings on
make_delta etc if we are reinventing them with comments everywhere.

> +                    if change[1] is None:
> +                        # Deleted
> 
> ^- not exactly accurate, it just means the file_id isn't present in this
> parent. It may be deleted, or it may just be never added.
>   # not present in this parent
> Is probably a better comment.

Sure.

> +                        continue
> +                    if change[2] not in merged_ids:
> +                        if change[0] is not None:
> +                            merged_ids[change[2]] = [
> +                                repo_basis.inventory[change[2]].revision,
> +                                change[3].revision]
> 
> ^- Calling the object "repo_basis" is a bit odd, since it is a
> "revision_tree". How about "basis_tree" instead?

This was while I had a basis_tree already, so I'll do an audit and clear
that up.

> +                        else:
> +                            merged_ids[change[2]] = [change[3].revision]
> +                        parent_entries[change[2]] =
> {change[3].revision:change[3]}
> 
> ...
> +            # NB:XXX: We are reconstructing path information we had, this
> +            # should be preserved instead.
> 
> ^- It seems like you could trivially do this by changing the
> 'parent_entries' dict, to instead be:
> parent_entries[file_id] = {revision:(ie, path)}

Thanks, I'll look at that.

> 
> 
> +        unchanged_merged = set(merged_ids) - set(changes)
> +        # Extend the changes dict with synthetic changes to record
> merges of
> +        # texts.
> 
> ^- I wonder if this wouldn't be better as:
> unchanged_merged = set(merged_ids).difference(changes)
> 
> Especially for the case when there is only 1 parent. Because merged_ids
> will then be empty, and can be a trivial no-op to return an empty set to
> 'unchanged_merged'. The way you have it written, we have to build up a
> set for all changes (which may be a lot for say the first-commit), even
> though we don't care about it.

I've seen set(foo).difference(Not-set-bar) be much slower than
settifying bar first. Thats why I spell it with a set() case on changes.
We can profile, of course.

> ^- maybe make it clearer that these are candidate revisions, and not
> inventory objects themselves?
>
> # changes maps {file_id: (change, [parent_revision_ids])}
> 
> I'm not sure if that is better or not..

Sure. I'll do something like that.

> +                carried_over = False
> +                if len(heads) == 1 and len(head_candidates) > 1:
> 
> ^- I'm not sure if the 'len(head_candidates) > 1' is valid to restrict
> this from being a carry-over. Consider the case where a file is added in
> a branch, and that addition is then merged. Doesn't that mean that
> "len(head_candidates)" will be 1 (the file didn't exist in the basis),
> but that carry-over is valid (because it is exactly the same content).
> 
> I *think* this check should just be "if len(heads) == 1:" which is the
> check we have now.

Well, it passes all our torture tests for this. So either they are
insufficient or its right :P.
Anyhow, in the case proposed:
basis has no foo
tree has foo, unaltered from parent1
parent 1 has foo
heads() == 1 (parent1.revid)
head_candidates will be:
[parent1.revid]
because of this:
            if change[1][0] is not None:
                head_candidate = [basis_inv[change[0]].revision]
            else:
                head_candidate = []
            changes[change[0]] = change, merged_ids.get(change[0],
                head_candidate)

(we only get a head candidate for the current tree if it had the file).
so yes, I agree, and I think it exposes a test hole. I'll look for that
as this is an area that comprehensive testing is extremely important
for.

> +                if kind == 'file':
> +                    # XXX: There is still a small race here: If someone
> reverts the content of a file
> +                    # after iter_changes examines and decides it has
> changed,
> +                    # we will unconditionally record a new version even
> if some
> +                    # other process reverts it while commit is running
> (with
> +                    # the revert happening after iter_changes did it's
> +                    # examination).
> +                    if change[7][1]:
> +                        entry.executable = True
> +                    else:
> +                        entry.executable = False
> +                    if (carry_over_possible and
> +                        parent_entry.executable == entry.executable):
> +                            # Check the file length, content hash after
> reading
> +                            # the file.
> +                            nostore_sha = parent_entry.text_sha1
> +                    else:
> +                        nostore_sha = None
> 
> ^- you take the time to skip "nostore_sha" in the case that
> entry.executable changed, but why not also set it in the case that the
> entry.size has changed?

If the size has changed the sha has to have too.

> Also, if you are concerned about the race condition, shouldn't you
> potentially set "entry.executable" based on the 'file_obj, stat_value'
> you get back from 'get_file_with_stat' ?
> (I realize you end up needing to be careful about win32 and executable
> bits, etc. But it seems like it might be less 'racy').

The race is that we may record a change that didn't happen. iter_changes
detects that NEWS has altered. someone concurrently does a manual
'revert NEWS' (e.g. saves the file in vim).

> ...
> 
> +                elif kind == 'tree-reference':
> +                    raise AssertionError('unknown kind %r' % kind)
> 
> ^- I'm not 100% sure what needs to be done here, but since we want to
> have 'tree-references' mostly supported in brisbane-core formats, it
> seems like we need to do something here.

Indeed, currently no tests are excercising this, so we need to improve
on that. normal record isn't tested for that either.

> ...
> 
> +    def test_abort_record_iter_changes(self):

> +
> 
> ^- shouldn't this test some sort of side-effect after 'builder.abort()'
> ? {like nothing new committed, etc.) Or is it just testing that abort
> exists and doesn't raise an exception?

Exactly. Its the mirror of the test_abort_record test case.

> 
> I didn't look as closely at the test cases.
> 
> ...
> 
> +    def test_last_modified_revision_after_rename_link_changes_ric(self):
> +        # renaming a link changes the last modified.
> +        self.requireFeature(tests.SymlinkFeature)
> +        tree = self.make_branch_and_tree('.')
> +        os.symlink('target', 'link')
> +        self._add_commit_renamed_check_changed(tree, 'link',
> +            mini_commit=self.mini_commit_record_iter_changes)
> +
> 
> ^- This certainly smacks of using a "multiply_tests" function, rather
> than lots of test_foo..._ric().

Yes, new features old code  :P. I'll get the critical issues dealt with
and look at doing this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090320/06a20a2e/attachment.pgp