[MERGE][#230567] Faster (local) branch
John Arbash Meinel
john at arbash-meinel.com
Tue May 27 18:23:39 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
| John Arbash Meinel wrote:
|
|> I think the reason we had the slow route, is that people wanted to be
|> able to
|> 'bzr co URL .' into a directory that already had files. I don't know who
|> asked
|> for that, etc, but I remember a conversation about it.
|
| Ah. It seems that is supported. 'branch' OTOH fails if the directory
| already exists.
|
|> I can say that if it is costing us a lot, then we might want to
|> reconsider how
|> we do it.
|
| On large trees like Mozilla and OOo, it certainly is costing a lot.
| The bigger the tree, the higher the cost.
|
|> Other comments, though...
|>
|> 1) You set the executability after renaming the files. It seems
|> backward, but
|> maybe it is only changing ones that already existed?
|
| That's just a reflection of how TreeTransform works now. I'm pretty
| sure there's a TODO in there re setting the execute bit when the
| files are created (in limbo) rather than afterwards, but I was
| keeping the logic sequence as it is today.
|
|> 2) If it is the wt.abspath call, I know of ways to make 'pathjoin'
|> faster.
|
| This isn't an issue. One issue is walking 100K files to find that we
| only need to rename the top few from limbo across. The new _apply_for_...
| method is reduced from nearly 40% to under 10% of cpu time by this
| change. And most of that 10% is taken up in apply_inventory_delta now.
I thought that 'needs_rename' was meant to handle that. (So that when TT is
populating the limbo directory it knows what it could put in a semi-final
location, and what it had to put at the top level.)
|
|> 3) If this really is the best route, how about we turn it on if we know
|> we just
|> created the target directory. (Hence we *know* it is empty.) Or even
|> doing a
|> "os.listdir('target_dir')" and seeing if it is empty, or only contains a
|> '.bzr'
|> directory. That way 'bzr co .' can still benefit from these code paths
|> as long
|> as you don't actually have files around that it should worry about.
|
| That's *sounds* right. I've done that in the attached patch and I've seen
| a nice reduction in time for checkouts now. HOWEVER, I'm not 100% sure if
| doing that is always safe? In particular, I'd had to tweak some bzrdir
| tests and I really can't understand why? Any ideas? (Maybe I'll need to
| drop the checkout support and only get faster branching but that seems
| a shame.)
Well, 'bzr co --lightweight && bzr switch' would likely be a good way to work
with Moz sized trees. So I would hate for it to be dramatically slower than 'bzr
branch'.
If the change you made is just adding the '.bzr/checkout/conflicts' file, then
it is probably just representative of a different code path. (One would set
no_conflicts=True, I imagine.)
I'm not worried about having an empty control file there.
~ def _set_executability(self, path, entry, trans_id):
~ """Set the executability of versioned files """
~ new_executability = self._new_executability[trans_id]
- - entry.executable = new_executability
+ if entry is not None:
+ entry.executable = new_executability
~ if supports_executable():
~ abspath = self._tree.abspath(path)
~ current_mode = os.stat(abspath).st_mode
^- It seems like an API violation to be passing 'entry=None' here. But maybe I'm
missing how you are using it.
I guess it is from this bit in _apply_operations_for_build_tree:
+ # Set the execute bits for the required files and clean-up
+ for trans_id in self._new_executability:
+ path = fp.get_path(trans_id)
+ self._set_executability(path, None, trans_id)
+ self._new_contents = {}
+
Don't you need to set the flags on the Inventory entries? Certainly
apply_inventory_delta would need to know them. (Otherwise on Win32, the dirstate
won't have the 'this should have the exec bit' set, and a 'bzr co && bzr st'
would show property modifications for all the executable files.
+ # If there are existing files outside of .bzr, we need to use a
+ # safer but slower algorithm. (This only happens when checkout
+ # is run in a directory with existing files.)
+ existing_files = False
+ for num, _unused in enumerate(wt.list_files()):
+ if num > 0:
+ existing_files = True
+ break
I thought I wouldn't like this, but it seems that list_files() is lazy about
descending into directories, so it should end up in a single os.listdir() call.
|
|> What is the benefit of:
|> + new_paths = [(fp.get_path(t), t) for t in self._needs_rename]
|> + for path, trans_id in new_paths:
|>
|> versus
|> for trans_id in self._needs_rename:
|> path = fp.get_path(trans_id)
|>
|>
|> You have the same pattern later on for "exec_paths".
|
| There's no benefit. It was just reflecting the pattern used in the old
| code. I've improved this as you suggested in the updated patch.
|
|> I would rather not add another code path if we can help it, but I didn't
|> do the profiling you did.
|>
|> On windows, doing 'bzr co --lightweight' I see 36s total, and 23.8 of
|> that is in 'writelines' and the other 7.8s is in 'open()'.
|>
|> Only leaving 6s (under lsprof) for the rest, which includes extracting
|> texts from packs, etc. Though it is only 8s without lsprof, so I'm a bit
|> surprised to see writelines() being that slow.
|
| What code base is that? On large trees on Linux, I'm consistently seeing
| a large saving by using this patch. Maybe the IO performance on Windows
| just swamps everything else?
|
| Ian C.
|
That is bzr.dev. Follow up runs show the writelines() time dropping a lot. Maybe
the disk was just spinning up.
Oh, we should also be careful about a few other things...
1) By default we do a 'status' in the source branch (if it has a working tree)
and then copy files from there that are unmodified. This saves us from unpacking
them from the repository. This probably has a large influence on some
benchmarks. Especially if you
2) do/don't have a source working tree or
3) have/haven't run bzr status to have hot cache and/or populated dirstate cache.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkg8Q5sACgkQJdeBCYSNAAONwACgj1ZSIkv3kI1zSR2UvRAzOny2
7k0An2jrlFsXMO5BCcv29dcJXXYbP4gt
=HcLZ
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list