[MERGE][#230567] Faster (local) branch

Tue May 27 18:23:39 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
| John Arbash Meinel wrote:
|
|> I think the reason we had the slow route, is that people wanted to be
|> able to
|> 'bzr co URL .' into a directory that already had files. I don't know who
|> asked
|> for that, etc, but I remember a conversation about it.
|
| Ah. It seems that is supported. 'branch' OTOH fails if the directory
| already exists.
|
|> I can say that if it is costing us a lot, then we might want to
|> reconsider how
|> we do it.
|
| On large trees like Mozilla and OOo, it certainly is costing a lot.
| The bigger the tree, the higher the cost.
|
|> Other comments, though...
|>
|> 1) You set the executability after renaming the files. It seems
|> backward, but
|> maybe it is only changing ones that already existed?
|
| That's just a reflection of how TreeTransform works now. I'm pretty
| sure there's a TODO in there re setting the execute bit when the
| files are created (in limbo) rather than afterwards, but I was
| keeping the logic sequence as it is today.
|
|> 2) If it is the wt.abspath call, I know of ways to make 'pathjoin'
|> faster.
|
| This isn't an issue. One issue is walking 100K files to find that we
| only need to rename the top few from limbo across. The new _apply_for_...
| method is reduced from nearly 40% to under 10% of cpu time by this
| change. And most of that 10% is taken up in apply_inventory_delta now.

I thought that 'needs_rename' was meant to handle that. (So that when TT is
populating the limbo directory it knows what it could put in a semi-final
location, and what it had to put at the top level.)

|
|> 3) If this really is the best route, how about we turn it on if we know
|> we just
|> created the target directory. (Hence we *know* it is empty.) Or even
|> doing a
|> "os.listdir('target_dir')" and seeing if it is empty, or only contains a
|> '.bzr'
|> directory. That way 'bzr co .' can still benefit from these code paths
|> as long
|> as you don't actually have files around that it should worry about.
|
| That's *sounds* right. I've done that in the attached patch and I've seen
| a nice reduction in time for checkouts now. HOWEVER, I'm not 100% sure if
| doing that is always safe? In particular, I'd had to tweak some bzrdir
| tests and I really can't understand why? Any ideas? (Maybe I'll need to
| drop the checkout support and only get faster branching but that seems
| a shame.)

Well, 'bzr co --lightweight && bzr switch' would likely be a good way to work
with Moz sized trees. So I would hate for it to be dramatically slower than 'bzr
branch'.

If the change you made is just adding the '.bzr/checkout/conflicts' file, then
it is probably just representative of a different code path. (One would set
no_conflicts=True, I imagine.)

I'm not worried about having an empty control file there.

~     def _set_executability(self, path, entry, trans_id):
~         """Set the executability of versioned files """
~         new_executability = self._new_executability[trans_id]
- -        entry.executable = new_executability
+        if entry is not None:
+            entry.executable = new_executability
~         if supports_executable():
~             abspath = self._tree.abspath(path)
~             current_mode = os.stat(abspath).st_mode

^- It seems like an API violation to be passing 'entry=None' here. But maybe I'm
missing how you are using it.

I guess it is from this bit in _apply_operations_for_build_tree:
+        # Set the execute bits for the required files and clean-up
+        for trans_id in self._new_executability:
+            path = fp.get_path(trans_id)
+            self._set_executability(path, None, trans_id)
+        self._new_contents = {}
+

Don't you need to set the flags on the Inventory entries? Certainly
apply_inventory_delta would need to know them. (Otherwise on Win32, the dirstate
won't have the 'this should have the exec bit' set, and a 'bzr co && bzr st'
would show property modifications for all the executable files.

+    # If there are existing files outside of .bzr, we need to use a
+    # safer but slower algorithm. (This only happens when checkout
+    # is run in a directory with existing files.)
+    existing_files = False
+    for num, _unused in enumerate(wt.list_files()):
+        if num > 0:
+            existing_files = True
+            break

I thought I wouldn't like this, but it seems that list_files() is lazy about
descending into directories, so it should end up in a single os.listdir() call.

That is bzr.dev. Follow up runs show the writelines() time dropping a lot. Maybe
the disk was just spinning up.

Oh, we should also be careful about a few other things...

1) By default we do a 'status' in the source branch (if it has a working tree)
and then copy files from there that are unmodified. This saves us from unpacking
them from the repository. This probably has a large influence on some
benchmarks.  Especially if you

2) do/don't have a source working tree or
3) have/haven't run bzr status to have hot cache and/or populated dirstate cache.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg8Q5sACgkQJdeBCYSNAAONwACgj1ZSIkv3kI1zSR2UvRAzOny2
7k0An2jrlFsXMO5BCcv29dcJXXYbP4gt
=HcLZ
-----END PGP SIGNATURE-----