[MERGE] TreeTransform avoids many renames when constructing trees

Tue Jun 5 06:10:27 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> John Arbash Meinel has voted +1.
> Status is now: Approved
> Comment:
> Adding self.rename_count doesn't seem strictly necessary. Is it just for
> testing?

Yes.  I thought that was a planned direction for future work, e.g. to
make sure we don't open too many transports, take too many locks, etc.
Do I misunderstand?

> I find names like "frexpar" to be more confusing than just saying
> "first_dir" or "second_dir". Having more meaningful names would make it
> easier to figure out what the tests are doing.

Yeah, I've moved away from the cutesy test suites, but most style guides
recommend retaining the style of surrounding code, which in this case
was wizard-of-oz-based.  I can make it more straightforward.

> This test seems a little odd:
> +    def test_change_parent(self):
> +        transform, root = self.get_transform()
> +        frexpar = transform.new_directory('frexpar', root)
> +        elphaba = transform.new_file('elphaba', frexpar, 'contents')
> +        oz = transform.new_directory('oz', root)
> +        transform.adjust_path('elphaba', oz, elphaba)
> +        transform.apply()
> +        self.failIfExists(self.wt.abspath('frexpar/elphaba'))
> +        self.failUnlessExists(self.wt.abspath('oz/elphaba'))
> +        # rename libbo/new1 => frexpar, rename limbo/new3 => oz
> +        # no rename for elphaba
> +        self.failUnlessEqual(2, transform.rename_count)
> 
> In that you create the file in one directory *and* rename it into
> another path within one .apply(). It would make more sense to me if you
> created it in one .apply() and then renamed in another.

This sort of thing is done in conflict resolution for example, where you
have two files named "foo" and one of them becomes "foo.moved".

> Is this just because you want the api to support any set of odd changes
> in one go?

Yes, the API is meant to support that sort of behavior, so I added tests
to make sure it does that it continues to support it, even with this
optimization.

> I also don't understand how the 'new_file()' call wouldn't have created
> 'elphaba' in 'frexpar' to start with, and then renamed it into 'oz'
> after the adjust_path() call. Though maybe you aren't counting renames
> outside of the '.apply()' call.

Yes, I'm only counting renames inside the apply.

> At the very least, it should be "limbo/new1 =>" not 'libbo'.

Okay.

> +        transform, root = self.get_transform()
> +        frexpar = transform.new_directory('frexpar', root)
> +        elphaba = transform.new_file('elphaba', frexpar, 'contents')
> +        nessarose = transform.new_file('nessarose', frexpar, 'contents')
> +        try:
> +            transform.cancel_creation(frexpar)
> +        except OSError:
> +            self.fail('Failed to move elphaba before deleting frexpar')
> +        transform.cancel_creation(nessarose)
> +        transform.create_directory(frexpar)
> +        try:
> +            transform.cancel_creation(frexpar)
> +        except OSError:
> +            self.fail('Transform still thinks nessarose is a child of
> frexpar')
> +        oz = transform.new_directory('oz', root)
> +        transform.adjust_path('elphaba', oz, elphaba)
> +        transform.apply()
> +        self.failIfExists(self.wt.abspath('frexpar'))
> +        self.failUnlessExists(self.wt.abspath('oz/elphaba'))
> +        # rename limbo/new-3 => oz, rename limbo/new-2 => elphaba
> +        self.failUnlessEqual(2, transform.rename_count)
> 
> The try/except clauses here don't seem to match what I think the code is
> actually doing.

The second test is whether the transform has correctly noted that
nessarose is no longer a child of frexpar.  If it still believes
nessarose is in frexpar's limbo directory, it will attempt to move
nessarose and get an OSerror, because limbo/new-1/nessarose doesn't exist.

> test_adjust_and_cancel()
> 
> doesn't this need to check that 2 directories are created, but nothing
> else happens?

What's crucial is that, like the above test, it does not attempt to move
elphaba.

> test_noname_contents
> 
> I don't understand how "create_directory()" can work without giving it a
> final name for that path. Is it just that TT doesn't need it at create_*
> time, just at finalize/apply time?

That's right.  I think merge works this way.

> Maybe just a docstring to indicate you are checking exactly that
> "create_directory()" doesn't need a final name yet.

Okay.

> I would be more intrigued if you could create a target directory which
> claims to be a child of oz-id and neither one had a name yet.

I'm not sure what you mean.  You mean a triply-nested path, like
foo/bar/baz?  Creating the contents for all three first, then assigning
the name?

> Just doing
> it at the top level means you haven't tried to do any nesting yet. (It
> may not be possible to create a child in a directory that doesn't have a
> name).

Well, it's possible to create a child that has a name in a directory
that has no name.

> The more I look at it, I don't understand why "test_change_parent()" can
> have the final file in the correct directory, but "test_reuse_name()"
> can't.

In theory it could, but creating elphaba2 while elphaba1 is occupying
that limbo path causes elphaba2 to be created at the top level of limbo.
 I don't consider it worthwhile to rename top-level limbo files into
their parents before Transform.apply().  If we did that, we would be
renaming a file before Transform.apply() to avoid renaming the file
during Transform.apply().

> Also, we don't really need the top-level directories to have their final
> names until the rename. Since we have to "os.rename('limbo/*', 'wt/*')"
> anyway. I don't know whether this simplifies your code or not. But you
> could use "oz == 'new-2'" until you go to "rename limbo/new-2 => oz"

Yes, that's what I do.

> You don't actually have to build up the temporary list here:
> -            for trans_id, kind in self._new_contents.iteritems():
> -                path = self._limbo_name(trans_id)
> +            entries = [(self._limbo_name(t), t, k) for t, k in
> +                       self._new_contents.iteritems()]
> +            for path, trans_id, kind in sorted(entries, reverse=True):
> 
> You could do it as a generator, which saves a bit of memory. Or you
> could do it as:
> entries = sorted((self._limbo_name(t), t, k)
>                  for t, k in self._new_contents.iteritems(),
>                  reverse=True)
> 
> I don't think it is strictly a big deal, but this is O(tree) so it is
> nice to avoid extra copies when possible.

AIUI, it is faster to iterate through a list than a generator.

> I'm guessing TT._limbo_name() could be optimized a bit (like have it do
> 1 lookup with a try/except KeyError rather than 2 with a "x in y: return
> y[x]". But that sort of depends on whether x is usually present or not.
> And from what I could see, it is a 50/50 mix. (You use it the first time
> you create something, and then when you are renaming everything into place)

Yeah, could be.  I wanted to defer optimization until these changes were
merged, to reduce the size of this patch.

Thanks for your review.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGZPAr0F+nu1YWqI0RAqeVAJ9jVfEuCSQefB+cZhENDDzZcLKE0QCfeRuX
TRipOpfMBmG0z7n8/bFvjOw=
=HfYV
-----END PGP SIGNATURE-----