[MERGE] Refactor commit to prepare for population by tree walking

Thu Jul 5 03:46:39 BST 2007

Aaron Bentley wrote:
> Ian Clatworthy wrote:

>> +        populator = self._populate_from_inventory
>> +        #if len(self.parents) == 1:
>> +        #    populator = self._populate_from_tree
>> +        populator(specific_files, root_added_already)
> 
> ^^ this is sort of ugly.  It would be much nicer to restore
> populate_from_tree first.

What do you mean by "restore"? Normally I'd set the preferred case and
override it if we detected we couldn't use it. In this case,
"populate_from_tree" is immature new code while
"populate_from_inventory" is simply the old code refactored.

>> +        # Unversion IDs that were found to be deleted
>> +        self.work_tree.unversion(deleted_ids)
> 
> ^^^ Doing this last disturbs me.  You can't normally do unversioning
> last, because you may need to put something else into the same position.
>  This one is okay, I guess.

Interesting. That popped up here thanks to an earlier refactoring that
merged the two "working inventory" loops into one. Previously, they were
unversioned at the end of _remove_deleted() and it was called before
_populate_new_inv() started.

Are you suggesting we need to go back to two loops? If so, the currently
committed code in bzr.dev (not this patch) is broken.

> 
>> +        # FIXME: be more comprehensive here:
>> +        # this works when both trees are in --trees repository,
>> +        # but when both are bound to a different repository,
>> +        # it fails;
> 
> So the concern here is about the case where they are using the same
> repository as bound branches?

To be honest, I haven't thought about that comment. My contribution was
simply to refactor the existing code, where that comment was present,
into a method so that:

1. it was easier to understand
2. the same code could be called from both populate methods

>> +    def _record_entry(self, path, file_id, specific_files, kind, name,
>> +                      parent_id, definitely_changed, existing_ie=None):
> 
> ^^^ I think this should be building up an inventory delta.  That's meant
> to be the new way of manipulating tree contents.  Basically, we
> shouldn't be touching Tree.inventory at all.
> 
>> +        "Record the new inventory entry for a path if any."
>> +        # mutter('check %s {%s}', path, file_id)
>> +        if (not specific_files or
>> +            is_inside_or_parent_of_any(specific_files, path)):
>> +                # mutter('%s selected for commit', path)
>> +                if definitely_changed or existing_ie is None:
>> +                    ie = inventory.make_entry(kind, name, parent_id, file_id)
>> +                else:
>> +                    ie = existing_ie.copy()
>> +                    ie.revision = None
>> +        else:
>> +            # mutter('%s not selected for commit', path)
>> +            if self.basis_inv.has_id(file_id):
> 
> ^^^ I think there's a tree method, but anyway, this is data you already
> know from iter_changes.
> 
>> +    def _report_change(self, ie, path):
>> +        """Report a change to the user.
>> +
>> +        The change that has occurred is described relative to the basis
>> +        inventory.
>> +        """
>> +        if (self.basis_inv.has_id(ie.file_id)):
>> +            basis_ie = self.basis_inv[ie.file_id]
>> +        else:
>> +            basis_ie = None
>> +        change = ie.describe_change(basis_ie, ie)
>> +        if change in (InventoryEntry.RENAMED,
>> +            InventoryEntry.MODIFIED_AND_RENAMED):
>> +            old_path = self.basis_inv.id2path(ie.file_id)
>> +            self.reporter.renamed(change, old_path, path)
>> +        else:
>> +            self.reporter.snapshot_change(change, path)
> 
> I think it would be nice to reuse delta._ChangeReporter here.
> Consistency FTW.  Also not using inventories FTW.

I don't doubt you're correct in all these points. My goal though, for
better or worse, is progressive transformation of the existing code.
These methods are simply the existing code refactored into readable
methods. I'd therefore like to keep them in the short term please.

My broader plan for refactoring goes something like this:

1. get the existing loop behind "populate_from_inventory".
2. get "populate_from_tree" working for the non-merge case.
3. make "populate_from_tree" use a new iterator (iter_commitable
   say) on working tree that gives just the right stuff back
4. use that information to refactor the interaction between
   inventory, repository and CommitBuilder. *This* is the
   really big performance win but it needs 3 to happen.
5. make "populate_from_tree" work for the merge case and
   drop "populate_from_inventory" out of the code altogether.
6. fix the above methods now that "populate_from_inventory" is
   dead and "populate_from_tree" is the only game in town.

Robert has been mentoring me on lower level changes to inventory to help
clean up responsibilities/APIs, but I'm typically feeling more
comfortable working from the outside in right now. My approach is
undoubtedly a slow one compared to how others might do it but I really
think a low risk strategy is wise given my relatively limited experience
with the code base and the importance of commit never breaking.

Does the above refactoring roadmap sound a good one in your opinion?

>> +    def _emit_progress_set_stage(self, name, entries_title=None):
>>          """Set the progress stage and emit an update to the progress bar."""
> 
> ^^^ does this need to be emitting progress *and* setting stage?

No quite sure what you mean by that question. The method sets the stage
and updates the progress bar with that information.

Ian C.