[MERGE] Remove the basis_tree parameter to record_iter_changes.
Robert Collins
robert.collins at canonical.com
Fri Mar 20 06:32:47 GMT 2009
On Thu, 2009-03-19 at 16:50 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> > I'm reasonably happy with this now, its probably not perfect, and it
> > needs profiling in a CHK environment, but I'd at least like it reviewed
> > now. (Updated to remove an unused parameter).
>
> You sent 2 messages within 30min with the same text but different
> subjects. This one is slightly newer and slightly bigger, so I assume it
> is the one you want reviewed.
It is (thus the 'updated' commend in the body). Sorry for the confusion.
> I'm guessing this code pre-dates the _make_delta() code (coming in from
> different code paths).
>
> I would rather you just remove this one, and use the one we already have.
Sure, I missed that this had fallen in.
> === modified file 'bzrlib/repository.py'
> ...
> + def record_iter_changes(self, tree, basis_revision_id, iter_changes,
> + _entry_factory=entry_factory):
>
> ^- My understanding here is that this 'iter_changes' is going to be
> something like WT.iter_changes(WT.basis_tree(), specific_files=XXX) etc,
> right?
yes.
> ...
>
> + if len(self.parents) > 1:
> + if basis_revision_id != self.parents[0]:
> + raise Exception(
> + "arbitrary basis parents not yet supported with
> merges")
>
> ^- I feel like in the short-to-medium term, we should have this check on
> at all times, whether there is more than one parent or not.
Yes, the check is wrong, fixing.
> v- slightly incorrect spacing here
> + # Setup the changes from the tree:
> + changes= {}
> + for change in iter_changes:
>
> v- This comment doesn't really seem correct, given that it only looks at
> items that are marked 'changed'.
I'm not sure what you mean about correctness.
> Also, whenever dealing with the output from 'iter_changes', it is
> helpful to add code comments like:
> # change[1][0] is the path in the basis inventory, which if None, means
> # means the content did not exist.
> # change[0] is the file_id
Sure, I can do that. It is rather redundant though :(. I certainly won't
claim to remember to do this all the time.
> ... coming back to the more-than-one-parent case:
>
> + if len(self.parents) > 1:
> + if basis_revision_id != self.parents[0]:
> + raise Exception(
> + "arbitrary basis parents not yet supported with
> merges")
> + repo_basis = revtrees[0]
> + for revtree in revtrees[1:]:
> + for change in revtree.inventory.make_delta(basis_inv):
>
> ^- another comment here for:
> # change: (old-path, new-path, file_id, new_ie)
Actually, this is a terrible smell. Whats the point of docstrings on
make_delta etc if we are reinventing them with comments everywhere.
> + if change[1] is None:
> + # Deleted
>
> ^- not exactly accurate, it just means the file_id isn't present in this
> parent. It may be deleted, or it may just be never added.
> # not present in this parent
> Is probably a better comment.
Sure.
> + continue
> + if change[2] not in merged_ids:
> + if change[0] is not None:
> + merged_ids[change[2]] = [
> + repo_basis.inventory[change[2]].revision,
> + change[3].revision]
>
> ^- Calling the object "repo_basis" is a bit odd, since it is a
> "revision_tree". How about "basis_tree" instead?
This was while I had a basis_tree already, so I'll do an audit and clear
that up.
> + else:
> + merged_ids[change[2]] = [change[3].revision]
> + parent_entries[change[2]] =
> {change[3].revision:change[3]}
>
> ...
> + # NB:XXX: We are reconstructing path information we had, this
> + # should be preserved instead.
>
> ^- It seems like you could trivially do this by changing the
> 'parent_entries' dict, to instead be:
> parent_entries[file_id] = {revision:(ie, path)}
Thanks, I'll look at that.
>
>
> + unchanged_merged = set(merged_ids) - set(changes)
> + # Extend the changes dict with synthetic changes to record
> merges of
> + # texts.
>
> ^- I wonder if this wouldn't be better as:
> unchanged_merged = set(merged_ids).difference(changes)
>
> Especially for the case when there is only 1 parent. Because merged_ids
> will then be empty, and can be a trivial no-op to return an empty set to
> 'unchanged_merged'. The way you have it written, we have to build up a
> set for all changes (which may be a lot for say the first-commit), even
> though we don't care about it.
I've seen set(foo).difference(Not-set-bar) be much slower than
settifying bar first. Thats why I spell it with a set() case on changes.
We can profile, of course.
> ^- maybe make it clearer that these are candidate revisions, and not
> inventory objects themselves?
>
> # changes maps {file_id: (change, [parent_revision_ids])}
>
> I'm not sure if that is better or not..
Sure. I'll do something like that.
> + carried_over = False
> + if len(heads) == 1 and len(head_candidates) > 1:
>
> ^- I'm not sure if the 'len(head_candidates) > 1' is valid to restrict
> this from being a carry-over. Consider the case where a file is added in
> a branch, and that addition is then merged. Doesn't that mean that
> "len(head_candidates)" will be 1 (the file didn't exist in the basis),
> but that carry-over is valid (because it is exactly the same content).
>
> I *think* this check should just be "if len(heads) == 1:" which is the
> check we have now.
Well, it passes all our torture tests for this. So either they are
insufficient or its right :P.
Anyhow, in the case proposed:
basis has no foo
tree has foo, unaltered from parent1
parent 1 has foo
heads() == 1 (parent1.revid)
head_candidates will be:
[parent1.revid]
because of this:
if change[1][0] is not None:
head_candidate = [basis_inv[change[0]].revision]
else:
head_candidate = []
changes[change[0]] = change, merged_ids.get(change[0],
head_candidate)
(we only get a head candidate for the current tree if it had the file).
so yes, I agree, and I think it exposes a test hole. I'll look for that
as this is an area that comprehensive testing is extremely important
for.
> + if kind == 'file':
> + # XXX: There is still a small race here: If someone
> reverts the content of a file
> + # after iter_changes examines and decides it has
> changed,
> + # we will unconditionally record a new version even
> if some
> + # other process reverts it while commit is running
> (with
> + # the revert happening after iter_changes did it's
> + # examination).
> + if change[7][1]:
> + entry.executable = True
> + else:
> + entry.executable = False
> + if (carry_over_possible and
> + parent_entry.executable == entry.executable):
> + # Check the file length, content hash after
> reading
> + # the file.
> + nostore_sha = parent_entry.text_sha1
> + else:
> + nostore_sha = None
>
> ^- you take the time to skip "nostore_sha" in the case that
> entry.executable changed, but why not also set it in the case that the
> entry.size has changed?
If the size has changed the sha has to have too.
> Also, if you are concerned about the race condition, shouldn't you
> potentially set "entry.executable" based on the 'file_obj, stat_value'
> you get back from 'get_file_with_stat' ?
> (I realize you end up needing to be careful about win32 and executable
> bits, etc. But it seems like it might be less 'racy').
The race is that we may record a change that didn't happen. iter_changes
detects that NEWS has altered. someone concurrently does a manual
'revert NEWS' (e.g. saves the file in vim).
> ...
>
> + elif kind == 'tree-reference':
> + raise AssertionError('unknown kind %r' % kind)
>
> ^- I'm not 100% sure what needs to be done here, but since we want to
> have 'tree-references' mostly supported in brisbane-core formats, it
> seems like we need to do something here.
Indeed, currently no tests are excercising this, so we need to improve
on that. normal record isn't tested for that either.
> ...
>
> + def test_abort_record_iter_changes(self):
> +
>
> ^- shouldn't this test some sort of side-effect after 'builder.abort()'
> ? {like nothing new committed, etc.) Or is it just testing that abort
> exists and doesn't raise an exception?
Exactly. Its the mirror of the test_abort_record test case.
>
> I didn't look as closely at the test cases.
>
> ...
>
> + def test_last_modified_revision_after_rename_link_changes_ric(self):
> + # renaming a link changes the last modified.
> + self.requireFeature(tests.SymlinkFeature)
> + tree = self.make_branch_and_tree('.')
> + os.symlink('target', 'link')
> + self._add_commit_renamed_check_changed(tree, 'link',
> + mini_commit=self.mini_commit_record_iter_changes)
> +
>
> ^- This certainly smacks of using a "multiply_tests" function, rather
> than lots of test_foo..._ric().
Yes, new features old code :P. I'll get the critical issues dealt with
and look at doing this.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20090320/06a20a2e/attachment.pgp
More information about the bazaar
mailing list