[MERGE] Remove the basis_tree parameter to record_iter_changes.

Fri Mar 20 14:12:18 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
...

>>
>> ^- another comment here for:
>> # change: (old-path, new-path, file_id, new_ie)
> 
> Actually, this is a terrible smell. Whats the point of docstrings on
> make_delta etc if we are reinventing them with comments everywhere.

We chose to go with tuples for performance, but they are awful for
maintaining code. It is a real pain to have to go look at the doc string
of iter_changes() and _make_delta() every time I'm trying to verify
"what is foo[4] again?"

I agree there is the the potential for skew, but if we get skew in a
tuple-based api, things are rather difficult anyway. I don't really
*like* putting the comments in, I just like it more than having to keep
extra windows open so I can remind myself what the things I'm looking at
really are.

> 
>> +                    if change[1] is None:
>> +                        # Deleted
>>
>> ^- not exactly accurate, it just means the file_id isn't present in this
>> parent. It may be deleted, or it may just be never added.
>>   # not present in this parent
>> Is probably a better comment.
> 
> Sure.
> 
>> +                        continue
>> +                    if change[2] not in merged_ids:
>> +                        if change[0] is not None:
>> +                            merged_ids[change[2]] = [
>> +                                repo_basis.inventory[change[2]].revision,
>> +                                change[3].revision]
>>
>> ^- Calling the object "repo_basis" is a bit odd, since it is a
>> "revision_tree". How about "basis_tree" instead?
> 
> This was while I had a basis_tree already, so I'll do an audit and clear
> that up.
> 
>> +                        else:
>> +                            merged_ids[change[2]] = [change[3].revision]
>> +                        parent_entries[change[2]] =
>> {change[3].revision:change[3]}
>>
>> ...
>> +            # NB:XXX: We are reconstructing path information we had, this
>> +            # should be preserved instead.
>>
>> ^- It seems like you could trivially do this by changing the
>> 'parent_entries' dict, to instead be:
>> parent_entries[file_id] = {revision:(ie, path)}
> 
> Thanks, I'll look at that.
> 
>>
>> +        unchanged_merged = set(merged_ids) - set(changes)
>> +        # Extend the changes dict with synthetic changes to record
>> merges of
>> +        # texts.
>>
>> ^- I wonder if this wouldn't be better as:
>> unchanged_merged = set(merged_ids).difference(changes)
>>
>> Especially for the case when there is only 1 parent. Because merged_ids
>> will then be empty, and can be a trivial no-op to return an empty set to
>> 'unchanged_merged'. The way you have it written, we have to build up a
>> set for all changes (which may be a lot for say the first-commit), even
>> though we don't care about it.
> 
> I've seen set(foo).difference(Not-set-bar) be much slower than
> settifying bar first. Thats why I spell it with a set() case on changes.
> We can profile, of course.
> 
> 
>> ^- maybe make it clearer that these are candidate revisions, and not
>> inventory objects themselves?
>>
>> # changes maps {file_id: (change, [parent_revision_ids])}
>>
>> I'm not sure if that is better or not..
> 
> Sure. I'll do something like that.
> 
> 
>> +                carried_over = False
>> +                if len(heads) == 1 and len(head_candidates) > 1:
>>
>> ^- I'm not sure if the 'len(head_candidates) > 1' is valid to restrict
>> this from being a carry-over. Consider the case where a file is added in
>> a branch, and that addition is then merged. Doesn't that mean that
>> "len(head_candidates)" will be 1 (the file didn't exist in the basis),
>> but that carry-over is valid (because it is exactly the same content).
>>
>> I *think* this check should just be "if len(heads) == 1:" which is the
>> check we have now.
> 
> Well, it passes all our torture tests for this. So either they are
> insufficient or its right :P.
> Anyhow, in the case proposed:
> basis has no foo
> tree has foo, unaltered from parent1
> parent 1 has foo
> heads() == 1 (parent1.revid)
> head_candidates will be:
> [parent1.revid]
> because of this:
>             if change[1][0] is not None:
>                 head_candidate = [basis_inv[change[0]].revision]
>             else:
>                 head_candidate = []
>             changes[change[0]] = change, merged_ids.get(change[0],
>                 head_candidate)
> 
> (we only get a head candidate for the current tree if it had the file).
> so yes, I agree, and I think it exposes a test hole. I'll look for that
> as this is an area that comprehensive testing is extremely important
> for.
> 

...

>> +                    if (carry_over_possible and
>> +                        parent_entry.executable == entry.executable):
>> +                            # Check the file length, content hash after
>> reading
>> +                            # the file.
>> +                            nostore_sha = parent_entry.text_sha1
>> +                    else:
>> +                        nostore_sha = None
>>
>> ^- you take the time to skip "nostore_sha" in the case that
>> entry.executable changed, but why not also set it in the case that the
>> entry.size has changed?
> 
> If the size has changed the sha has to have too.

I agree that is true for correctness, but you wouldn't *have* to sha if
the size changed. (I guess ultimately you have to do it at least 1 time,
since the inventory wants to record the current value.)

> 
>> Also, if you are concerned about the race condition, shouldn't you
>> potentially set "entry.executable" based on the 'file_obj, stat_value'
>> you get back from 'get_file_with_stat' ?
>> (I realize you end up needing to be careful about win32 and executable
>> bits, etc. But it seems like it might be less 'racy').
> 
> The race is that we may record a change that didn't happen. iter_changes
> detects that NEWS has altered. someone concurrently does a manual
> 'revert NEWS' (e.g. saves the file in vim).
> 

Wouldn't that hit the "nostore_sha" path? I guess not if you set the
executable bit? But it looks like entry.executable is fixed at this
point? (It should probably be fixed from the stat of file_obj that you
get back if possible.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknDpEIACgkQJdeBCYSNAAO14gCgsb1ODvNQbgk1wHESbgn+FuhX
NmsAoJFNl7/2qWXR1jvB1fTENQtQFaMS
=89zB
-----END PGP SIGNATURE-----