[MERGE][#230567] Faster (local) branch

Fri May 30 02:12:14 BST 2008

Aaron Bentley wrote:

> So far, the high-level changes seem to be
> 1. not calling id2path
> 2. better chmod handling
> 3. not calling resolve_conflicts
> 
> 3 is not acceptable.  1 and 2 don't seem to require a new code path.  So
> I think this idea of adopting a fast path when applicable isn't
> necessary.  Are there other details I'm missing?

OK. I'll put back the call to resolve_conflicts.

I'm also not building inventory entries - I'm using the entries from the
source inventory instead. Again, that's only possible when the tree is
being created from scratch.

> I'll freely admit I've left branch creation speed on the back burner,
> because I was waiting for a new implementation of iter_files_bytes.
> These numbers are a good kick in the pants, because they show room for
> improvement even without it.
> 
> They've stimulated new thinking for me, and I hope my feedback is
> helpful to you.  But yes, I am very much wedded to the conceptual
> simplicity of a single codepath.  I want to our efficiency wins to apply
> to merge and revert as well.  I want our behavior to be the same for all
> operations.  And of course, less code is usually better.

Thanks for your insight and feedback here. I'm not comfortable changing
this area without your input/review.

> Using a single codepath is a strategy that's worked well so far-- the
> single-codepath implementation is only 1.6x slower than the fast-path
> implementation.  Let's see how much farther we can push it!

I'll take a fresh look at how I can make the single codepath faster.
Having said that, I'd like to request that we keep an open mind about
multiple code paths here. The faster code path I've introduced will
apply to 100% of branch commands and almost 100% of checkout commands.
*One* way to get a single code path is to drop support for checkout
into a directory already containing files. :-)

I'm also not convinced that branch/checkout should be treated as
special cases of merge/revert w.r.t. internal logic. It's perfectly
acceptable IMO to maximise the code resuse but a 50-60% performance
hit for doing so unnecessarily isn't justified. In fact, I suspect
there's even more room for making the 'create from scratch' code path
faster still, e.g. coping across the dirstate from the source tree
instead of building it from scratch (in the common case where the
source branch is a local pristine mirror - the practice heavily
recommended by our User Guide). As a point of reference, I can
install Ubuntu from scratch using the installer much faster than I
can upgrade my install using Update Manager. That's OK - the latter
has a far more complex job to do.

To put some context around this work, it's been bugging me for some
time that our local branching isn't faster. With our shared repository
architecture, there's simply no good reason why Bazaar can't beat
Git and Hg every time in benchmarks in this area. In the case of
OpenOffice.org in particular, I know this is an area where the
competition are claiming we're too slow. That isn't true but these
changes improve our time from 1m15s to 40+s. If we're serious about
performance, we ought to be taking those sorts of gains IMO by using
a faster code path, particularly if it applies in the majority of
cases as this one does.

Ian C.