[MERGE] TreeTransform avoids many renames when constructing trees
Aaron Bentley
aaron.bentley at utoronto.ca
Mon Jun 4 14:44:00 BST 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
> build-tree() returns a _TransformResults object so I'd suggest 'tr' as
> the result variable name instead of 'tt'.
Good catch; it used to return the TreeTransform.
>> + # A mapping of transform ids to a set of the transform ids of children
>> + # their limbo directory has
>
> s/their/that/
Actually, the sense of my sentence was "...children that their limbo
directory has"
The distinction is that a transform id may have a limbo directory, but
the limbo directory is not the transform id. In particular, a transform
id might have children which are not stored in its limbo directory.
>> - def _limbo_name(self, trans_id):
>> + def _limbo_name(self, trans_id, from_scratch=False):
>> """Generate the limbo name of a file"""
>> - return pathjoin(self._limbodir, trans_id)
>> + if trans_id in self._limbo_files and not from_scratch:
>> + return self._limbo_files[trans_id]
>> + parent = self._new_parent.get(trans_id)
>> + # if the parent directory is already in limbo (e.g. when building a
>> + # tree), choose a limbo name inside the parent, to reduce further
>> + # renames.
>> + use_direct_path = False
>> + if self._new_contents.get(parent) == 'directory':
>> + filename = self._new_name.get(trans_id)
>> + if filename is not None:
>> + if parent not in self._limbo_children:
>> + self._limbo_children[parent] = set()
>> + self._limbo_children_names[parent] = {}
>> + use_direct_path = True
>> + elif (self._limbo_children_names[parent].get(filename)
>> + in (trans_id, None)):
>> + use_direct_path = True
>
> This routine is all good to me bar this elif branch. I kind of expected
> the 'None' case being the default and guessed that the 'trans_id' case
> might happen if the filename was already there??? And being in that
> dictionary with a different value is possible as well given you're
> explicitly guarding against that?
The None case will be the common case. The children_names come from an
external source, and may contain duplicates at this stage. Those
duplicates must be resolved before apply is called, but they are
perfectly legal at this point.
This behavior is tested in the "test_reuse_name" test case.
> That's a long way of saying that a comment would help here IMO.
How about this:
# the direct path can only be used if no other file has already taken
# this pathname, i.e. if the name is unused, or if it is already
# associated with this trans_id.
>> @@ -1219,10 +1291,11 @@
>> wt.add_conflicts(conflicts)
>> except errors.UnsupportedOperation:
>> pass
>> - tt.apply()
>> + result = tt.apply()
>> finally:
>> tt.finalize()
>> top_pb.finished()
>> + return result
>>
>>
>
> This looks like a potential bug to me. result is only set as the last
> thing in a long try/finally block.
I don't think that's a problem. If we return, that means we didn't
encounter an exception in the try block, which means we did assign a
value to result.
An alternative would be to return *in* the try/finally block; would you
prefer that?
> I think it needs to be initialised to
> a benign value (None?) prior to the start of the try/finally block.
It would be an error to return anything other than a _TransformResult
here. If it somehow became possible to reach that return statement
without assigning a value to result, I think it would be much better to
fail because "result" was undefined in local scope than to silently,
incorrectly, continue. If it became possible to return None in some
circumstances, then we'd need to check the return type in the calling
code. I think that just adds complexity.
> I'd
> also like to see the docstring for build_tree() improved to explicitly
> explain that a _TransformResult object is returned.
Sure.
> While not directly related to your patch (and therefore not necessary to
> land it), here are some other thoughts on some bits of transform.py I've
> read over today:
>
> 1. finalize(). Putting try/except around os.rmdir() and os.unlink()
> feels more defensive to me than assuming, however good the assumption
> is, that they can't fail. If one rmdir/unlink does fails, do we
> really want to implicitly abort?
I do. If we lose track of our limbo files, everything should come to a
screeching halt.
> With nested files now, will children always get nuked before parents?
Oh, that's one I hadn't thought about. I'll get back to you on it.
> I'd suggest making that explicit in the apply() docstring in that
> case. __init__ makes it clear that finalize() needs to be called
> so making it clear that apply() does this clarifies the caller
> contract IMO.
Okay.
> 3. _build_tree large internal comment: spaces missing in kindof, partof
> and hereis.
I agree. I didn't write that.
> 4. revert()'s change_reporter parameter. This should be initialised to
> *False*, not *None*. Only the boolean value is tested in the code
> and only a boolean value is passed in from the one place I could
> find calling it with that parameter (workingtree.py).
You will find many part of the codebase that take advantage of the fact
that None evaluates to False. The default of None is in fact
well-tested in the test suite, e.g. in bzrlib/test_revert.py
I would much rather change the WorkingTree default.
> 5. _alter_files() trivial stuff. No need for if/else for setting
> skip_root - the value of the if test is fine.
This is a bone of contention. I find code like
skip_root = target_tree.inventory.root is None
to be unintuitive. I've proposed doing
skip_root = bool(target_tree.inventory.root is None)
but people seemed to prefer
skip_root = (target_tree.inventory.root is None)
which to me is not much more readable than the first example.
> Also a comment
> spelling mistake: lazyily -> lazily.
Okay.
> Once again, apologies for only a partial review.
Well, that does help. It just doesn't get it merged :-)
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGZBcg0F+nu1YWqI0RAieDAJkBOspP8HFPbIKKxp9LIlPl/4HjFACfUqBa
sPEO5CGoZ5Z8ie2FUgUj97o=
=SgSM
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list