[MERGE][#230567] Faster (local) branch

Tue May 27 15:36:06 BST 2008

John Arbash Meinel wrote:

> I think the reason we had the slow route, is that people wanted to be
> able to
> 'bzr co URL .' into a directory that already had files. I don't know who
> asked
> for that, etc, but I remember a conversation about it.

Ah. It seems that is supported. 'branch' OTOH fails if the directory
already exists.

> I can say that if it is costing us a lot, then we might want to
> reconsider how
> we do it.

On large trees like Mozilla and OOo, it certainly is costing a lot.
The bigger the tree, the higher the cost.

> Other comments, though...
> 
> 1) You set the executability after renaming the files. It seems
> backward, but
> maybe it is only changing ones that already existed?

That's just a reflection of how TreeTransform works now. I'm pretty
sure there's a TODO in there re setting the execute bit when the
files are created (in limbo) rather than afterwards, but I was
keeping the logic sequence as it is today.

> 2) If it is the wt.abspath call, I know of ways to make 'pathjoin'
> faster.

This isn't an issue. One issue is walking 100K files to find that we
only need to rename the top few from limbo across. The new _apply_for_...
method is reduced from nearly 40% to under 10% of cpu time by this
change. And most of that 10% is taken up in apply_inventory_delta now.

> 3) If this really is the best route, how about we turn it on if we know
> we just
> created the target directory. (Hence we *know* it is empty.) Or even
> doing a
> "os.listdir('target_dir')" and seeing if it is empty, or only contains a
> '.bzr'
> directory. That way 'bzr co .' can still benefit from these code paths
> as long
> as you don't actually have files around that it should worry about.

That's *sounds* right. I've done that in the attached patch and I've seen
a nice reduction in time for checkouts now. HOWEVER, I'm not 100% sure if
doing that is always safe? In particular, I'd had to tweak some bzrdir
tests and I really can't understand why? Any ideas? (Maybe I'll need to
drop the checkout support and only get faster branching but that seems
a shame.)

> What is the benefit of:
> +        new_paths = [(fp.get_path(t), t) for t in self._needs_rename]
> +        for path, trans_id in new_paths:
> 
> versus
> for trans_id in self._needs_rename:
>   path = fp.get_path(trans_id)
> 
> 
> You have the same pattern later on for "exec_paths".

There's no benefit. It was just reflecting the pattern used in the old
code. I've improved this as you suggested in the updated patch.

> I would rather not add another code path if we can help it, but I didn't
> do the profiling you did.
> 
> On windows, doing 'bzr co --lightweight' I see 36s total, and 23.8 of
> that is in 'writelines' and the other 7.8s is in 'open()'.
> 
> Only leaving 6s (under lsprof) for the rest, which includes extracting
> texts from packs, etc. Though it is only 8s without lsprof, so I'm a bit
> surprised to see writelines() being that slow.

What code base is that? On large trees on Linux, I'm consistently seeing
a large saving by using this patch. Maybe the IO performance on Windows
just swamps everything else?

Ian C.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: faster-branch-3.patch
Type: text/x-diff
Size: 20706 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080528/639a4e25/attachment-0001.bin