[MERGE] Refactor commit to prepare for population by tree walking

Thu Jul 5 12:19:07 BST 2007

Aaron Bentley wrote:

> Well, what it looks like is gobs and gobs of new code that does things
> the wrong way.  If it's not actually new, that's good to know.
> 
> Partly what's going on is you're taking code out of loops into
> functions, and that's changing its indenting, so it looks different.

Aaron,

Thanks for the review. I'll put together a revised patch and resubmit.

For the record, the *only* new code is populate_from_tree. All the rest
is existing code refactored. populate_from_tree was included, but not
called, to show where I was up to and to (partly) justify moving the
blocks of code I did into methods.

> Of course, when you're trying to speed something up, the last thing you
> want is more function calls. :-)

Agreed. But function calls vs inline makes absolutely no change
profiling wise on commit right now because we're doing some many other
inefficient things. I'll be *delighted* when I reach the point that's an
issue. :-)

> I do see at least one bit of crack in populate_from_tree:
> 
>         # Unversion IDs that were found to be deleted
>         # Note: the logic above collects some IDs already gone so
>         # we walk the list and trap the exception
>         for id in deleted_ids:
>             try:
>                 self.work_tree.unversion([id])
>             except NoSuchId:
>                 pass
> 
> iter_changes *always* tells you whether a file is currently versioned,
> so there's no reason to stick an unversioned file in deleted_ids.

That's good to know. I expected as much and only added this block of
code when 2 unit tests failed to pass without it. :-( I'll dig deeper ...

>> My broader plan for refactoring goes something like this:
> 
>> 1. get the existing loop behind "populate_from_inventory".
>> 2. get "populate_from_tree" working for the non-merge case.
>> 3. make "populate_from_tree" use a new iterator (iter_commitable
>>    say) on working tree that gives just the right stuff back
>> 4. use that information to refactor the interaction between
>>    inventory, repository and CommitBuilder. *This* is the
>>    really big performance win but it needs 3 to happen.
> 
> I think that if you're not using the best available interfaces by stage
> 3, your analysis in stage 4 is going to be flawed.
> 
>> 5. make "populate_from_tree" work for the merge case and
>>    drop "populate_from_inventory" out of the code altogether.
> 
> I think that doing stage 5 could also have a dramatic impact on your
> analysis.  In fact, it's not clear to me that you need iter_commitable
> before 5.  I think iter_changes would work.  Certainly, I've used it in
> the non-merge case.

Thanks for the feedback. The reason I think iter_commitable is warranted
earlier is because iter_changes doesn't provide me the SHA value, just a
'changed' flag. That means later code (InventoryFile._read_tree_state
currently) has to look up the SHA value separately and that hurting
performance a considerable amount according to the profiler. The
CommitBuilder code needs the SHA value in order to test for some special
cases - 'changed' isn't enough.

I like your idea re InventoryDelta so I'll make sure my refactoring
plans head that way.

Ian C.