[MERGE] Refactor commit to prepare for population by tree walking

Thu Jul 5 06:17:16 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Aaron Bentley wrote:
>> Ian Clatworthy wrote:
> 
>>> +        populator = self._populate_from_inventory
>>> +        #if len(self.parents) == 1:
>>> +        #    populator = self._populate_from_tree
>>> +        populator(specific_files, root_added_already)
>> ^^ this is sort of ugly.  It would be much nicer to restore
>> populate_from_tree first.
> 
> What do you mean by "restore"? 

I mean un-comment it, and get it working.

> Normally I'd set the preferred case and
> override it if we detected we couldn't use it. In this case,
> "populate_from_tree" is immature new code while
> "populate_from_inventory" is simply the old code refactored.

I don't like to see dead code being added to the codebase.  If it's not
ready to be used, then we shouldn't merge it.

> Are you suggesting we need to go back to two loops? If so, the currently
> committed code in bzr.dev (not this patch) is broken.

No.  I'm saying that I've spent a lot of time in the general case, where
there are deletes, moves and adds.  In the general case, deletes must be
done before adds and moves.  So they typically come not last, and this
looks weird to me.

But since this is not the general case, we can get away with it.

>>> +        # FIXME: be more comprehensive here:
>>> +        # this works when both trees are in --trees repository,
>>> +        # but when both are bound to a different repository,
>>> +        # it fails;
>> So the concern here is about the case where they are using the same
>> repository as bound branches?
> 
> To be honest, I haven't thought about that comment. My contribution was
> simply to refactor the existing code

Okay.  But you should generally consider whether refactoring is
obsoleting the comments.

> I don't doubt you're correct in all these points. My goal though, for
> better or worse, is progressive transformation of the existing code.

I don't think anything I suggested was a radical change.

> These methods are simply the existing code refactored into readable
> methods. I'd therefore like to keep them in the short term please.

Well, what it looks like is gobs and gobs of new code that does things
the wrong way.  If it's not actually new, that's good to know.

Partly what's going on is you're taking code out of loops into
functions, and that's changing its indenting, so it looks different.

Of course, when you're trying to speed something up, the last thing you
want is more function calls. :-)

I do see at least one bit of crack in populate_from_tree:

        # Unversion IDs that were found to be deleted
        # Note: the logic above collects some IDs already gone so
        # we walk the list and trap the exception
        for id in deleted_ids:
            try:
                self.work_tree.unversion([id])
            except NoSuchId:
                pass

iter_changes *always* tells you whether a file is currently versioned,
so there's no reason to stick an unversioned file in deleted_ids.

And if you were using apply_inventory_delta, you wouldn't have to check
whether each file was inside previously deleted files, because
apply_inventory_delta doesn't care about that.

> My broader plan for refactoring goes something like this:
> 
> 1. get the existing loop behind "populate_from_inventory".
> 2. get "populate_from_tree" working for the non-merge case.
> 3. make "populate_from_tree" use a new iterator (iter_commitable
>    say) on working tree that gives just the right stuff back
> 4. use that information to refactor the interaction between
>    inventory, repository and CommitBuilder. *This* is the
>    really big performance win but it needs 3 to happen.

I think that if you're not using the best available interfaces by stage
3, your analysis in stage 4 is going to be flawed.

> 5. make "populate_from_tree" work for the merge case and
>    drop "populate_from_inventory" out of the code altogether.

I think that doing stage 5 could also have a dramatic impact on your
analysis.  In fact, it's not clear to me that you need iter_commitable
before 5.  I think iter_changes would work.  Certainly, I've used it in
the non-merge case.

> Robert has been mentoring me on lower level changes to inventory to help
> clean up responsibilities/APIs, but I'm typically feeling more
> comfortable working from the outside in right now. My approach is
> undoubtedly a slow one compared to how others might do it but I really
> think a low risk strategy is wise given my relatively limited experience
> with the code base and the importance of commit never breaking.
> 
> Does the above refactoring roadmap sound a good one in your opinion?

It's not how I would do it.  I might refactor, but I would transform
populate_inventory into populate_tree piece by piece.

I'd probably also start by putting in an iterator, and gradually moving
things into it.  these is_inside_any calls look like they can be used
for filtration early.

>>> +    def _emit_progress_set_stage(self, name, entries_title=None):
>>>          """Set the progress stage and emit an update to the progress bar."""
>> ^^^ does this need to be emitting progress *and* setting stage?
> 
> No quite sure what you mean by that question.

I mean, can this method only set the stage?  Because we already have a
method to emit progress.  The long name suggested that the method was
doing too much.

> The method sets the stage
> and updates the progress bar with that information.

Even just changing it to _set_stage_emit_progress would be better,
because it would explain why progress was being emitted.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGjH7c0F+nu1YWqI0RAjWnAJ0S78RB34EZ8RTsSBPECTwB+t2yjgCePKuR
dLZU5R0UD6R6tr1f2rCDals=
=7qy8
-----END PGP SIGNATURE-----