[MERGE] TreeTransform avoids many renames when constructing trees

Mon Jun 4 23:32:09 BST 2007

John Arbash Meinel has voted +1.
Status is now: Approved
Comment:
Adding self.rename_count doesn't seem strictly necessary. Is it just for 
testing?

I find names like "frexpar" to be more confusing than just saying 
"first_dir" or "second_dir". Having more meaningful names would make it 
easier to figure out what the tests are doing.

This test seems a little odd:
+    def test_change_parent(self):
+        transform, root = self.get_transform()
+        frexpar = transform.new_directory('frexpar', root)
+        elphaba = transform.new_file('elphaba', frexpar, 'contents')
+        oz = transform.new_directory('oz', root)
+        transform.adjust_path('elphaba', oz, elphaba)
+        transform.apply()
+        self.failIfExists(self.wt.abspath('frexpar/elphaba'))
+        self.failUnlessExists(self.wt.abspath('oz/elphaba'))
+        # rename libbo/new1 => frexpar, rename limbo/new3 => oz
+        # no rename for elphaba
+        self.failUnlessEqual(2, transform.rename_count)

In that you create the file in one directory *and* rename it into 
another path within one .apply(). It would make more sense to me if you 
created it in one .apply() and then renamed in another.
Is this just because you want the api to support any set of odd changes 
in one go?

I also don't understand how the 'new_file()' call wouldn't have created 
'elphaba' in 'frexpar' to start with, and then renamed it into 'oz' 
after the adjust_path() call. Though maybe you aren't counting renames 
outside of the '.apply()' call.

At the very least, it should be "limbo/new1 =>" not 'libbo'.

+        transform, root = self.get_transform()
+        frexpar = transform.new_directory('frexpar', root)
+        elphaba = transform.new_file('elphaba', frexpar, 'contents')
+        nessarose = transform.new_file('nessarose', frexpar, 
'contents')
+        try:
+            transform.cancel_creation(frexpar)
+        except OSError:
+            self.fail('Failed to move elphaba before deleting frexpar')
+        transform.cancel_creation(nessarose)
+        transform.create_directory(frexpar)
+        try:
+            transform.cancel_creation(frexpar)
+        except OSError:
+            self.fail('Transform still thinks nessarose is a child of 
frexpar')
+        oz = transform.new_directory('oz', root)
+        transform.adjust_path('elphaba', oz, elphaba)
+        transform.apply()
+        self.failIfExists(self.wt.abspath('frexpar'))
+        self.failUnlessExists(self.wt.abspath('oz/elphaba'))
+        # rename limbo/new-3 => oz, rename limbo/new-2 => elphaba
+        self.failUnlessEqual(2, transform.rename_count)

The try/except clauses here don't seem to match what I think the code is 
actually doing. They may have been correct at one point, but it could 
also be because 'elphaba' was considered a child of 'frexpar'.
Maybe it should just say "Transform still thinks frexpar has children". 
And it should probably print out the OSError just in case it is 
something else.

test_adjust_and_cancel()

doesn't this need to check that 2 directories are created, but nothing 
else happens?

test_noname_contents

I don't understand how "create_directory()" can work without giving it a 
final name for that path. Is it just that TT doesn't need it at create_* 
time, just at finalize/apply time?
Maybe just a docstring to indicate you are checking exactly that 
"create_directory()" doesn't need a final name yet.

I would be more intrigued if you could create a target directory which 
claims to be a child of oz-id and neither one had a name yet. Just doing 
it at the top level means you haven't tried to do any nesting yet. (It 
may not be possible to create a child in a directory that doesn't have a 
name).

The more I look at it, I don't understand why "test_change_parent()" can 
have the final file in the correct directory, but "test_reuse_name()" 
can't. Since it seems whatever trick TT uses to know that adjusting the 
path from frexpar/elphaba => oz/elphaba is sufficient to track that 
oz/elphaba (2) => oz/galinda. (or effectively new-3=>oz/galinda before 
the .apply happens).

Also, we don't really need the top-level directories to have their final 
names until the rename. Since we have to "os.rename('limbo/*', 'wt/*')" 
anyway. I don't know whether this simplifies your code or not. But you 
could use "oz == 'new-2'" until you go to "rename limbo/new-2 => oz"

You don't actually have to build up the temporary list here:
-            for trans_id, kind in self._new_contents.iteritems():
-                path = self._limbo_name(trans_id)
+            entries = [(self._limbo_name(t), t, k) for t, k in
+                       self._new_contents.iteritems()]
+            for path, trans_id, kind in sorted(entries, reverse=True):

You could do it as a generator, which saves a bit of memory. Or you 
could do it as:
entries = sorted((self._limbo_name(t), t, k)
                  for t, k in self._new_contents.iteritems(),
                  reverse=True)

I don't think it is strictly a big deal, but this is O(tree) so it is 
nice to avoid extra copies when possible.

I'm guessing TT._limbo_name() could be optimized a bit (like have it do 
1 lookup with a try/except KeyError rather than 2 with a "x in y: return 
y[x]". But that sort of depends on whether x is usually present or not. 
And from what I could see, it is a 50/50 mix. (You use it the first time 
you create something, and then when you are renaming everything into 
place)

Anyway, this is mostly just nit-picking. I'm more than happy to have 
this merged as is. But if you want to address some of this, feel free. 
:)

For details, see: 
http://bundlebuggy.aaronbentley.com/request/%3C466465DE.30108%40utoronto.ca%3E