[MERGE][#230567] Faster (local) branch
ian.clatworthy at internode.on.net
Tue May 27 15:36:06 BST 2008
John Arbash Meinel wrote:
> I think the reason we had the slow route, is that people wanted to be
> able to
> 'bzr co URL .' into a directory that already had files. I don't know who
> for that, etc, but I remember a conversation about it.
Ah. It seems that is supported. 'branch' OTOH fails if the directory
> I can say that if it is costing us a lot, then we might want to
> reconsider how
> we do it.
On large trees like Mozilla and OOo, it certainly is costing a lot.
The bigger the tree, the higher the cost.
> Other comments, though...
> 1) You set the executability after renaming the files. It seems
> backward, but
> maybe it is only changing ones that already existed?
That's just a reflection of how TreeTransform works now. I'm pretty
sure there's a TODO in there re setting the execute bit when the
files are created (in limbo) rather than afterwards, but I was
keeping the logic sequence as it is today.
> 2) If it is the wt.abspath call, I know of ways to make 'pathjoin'
This isn't an issue. One issue is walking 100K files to find that we
only need to rename the top few from limbo across. The new _apply_for_...
method is reduced from nearly 40% to under 10% of cpu time by this
change. And most of that 10% is taken up in apply_inventory_delta now.
> 3) If this really is the best route, how about we turn it on if we know
> we just
> created the target directory. (Hence we *know* it is empty.) Or even
> doing a
> "os.listdir('target_dir')" and seeing if it is empty, or only contains a
> directory. That way 'bzr co .' can still benefit from these code paths
> as long
> as you don't actually have files around that it should worry about.
That's *sounds* right. I've done that in the attached patch and I've seen
a nice reduction in time for checkouts now. HOWEVER, I'm not 100% sure if
doing that is always safe? In particular, I'd had to tweak some bzrdir
tests and I really can't understand why? Any ideas? (Maybe I'll need to
drop the checkout support and only get faster branching but that seems
> What is the benefit of:
> + new_paths = [(fp.get_path(t), t) for t in self._needs_rename]
> + for path, trans_id in new_paths:
> for trans_id in self._needs_rename:
> path = fp.get_path(trans_id)
> You have the same pattern later on for "exec_paths".
There's no benefit. It was just reflecting the pattern used in the old
code. I've improved this as you suggested in the updated patch.
> I would rather not add another code path if we can help it, but I didn't
> do the profiling you did.
> On windows, doing 'bzr co --lightweight' I see 36s total, and 23.8 of
> that is in 'writelines' and the other 7.8s is in 'open()'.
> Only leaving 6s (under lsprof) for the rest, which includes extracting
> texts from packs, etc. Though it is only 8s without lsprof, so I'm a bit
> surprised to see writelines() being that slow.
What code base is that? On large trees on Linux, I'm consistently seeing
a large saving by using this patch. Maybe the IO performance on Windows
just swamps everything else?
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 20706 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080528/639a4e25/attachment-0001.bin
More information about the bazaar