Accelerate build_tree by using working tree files

Aaron Bentley aaron.bentley at utoronto.ca
Thu Dec 20 03:36:41 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> Aaron Bentley wrote:
>> Ian Clatworthy wrote:
> 
>>> * creating a local branch: 63.9 -> 33.1
>> Could you send me the callgrind.out for that?  I would have expected
>> better.  This is still using a shared-repo, right?  It's barely reading
>> or writing any files, so what's it doing?
> 
> Attached (compressed).

Thank you.

So the bulk of the time is spent in build_tree.  Nearly equal time is
spent in _iter_files_bytes_accelerated, which retrieves the file
contents, and TreeTransform.apply, which moves the files into place.
Third place goes to TreeTransform.create_file, which writes the files to
disk.

I think these are sub-optimal results, because AFAICT, you're running
against a dirty dirstate.  _iter_changes should have no reason to call
os.sha_file_by_name, because it should all be in dirstate.  But in fact,
it's being called 37 667 times!

Possible options for reducing time further:
- - make a version of Tree._iter_changes (17%) that does less work,
  faster.
- - There are some write-locked operations that probably should be
  investigated.  I suspect they may be related to dirstate<=>inventory
  translation.
- - set_parent_trees (9%) doesn't seem to be able to accelerate using a
  DirStateRevisionTree-- we can probably do something more efficient.
- - apply_inventory_delta (7%) could probably be optimized with a native
  dirstate implementation.

>>> Note that checkout and lightweight checkout performance drop slightly
>>> though: 64 -> 66 seconds.
>> Are you using --files-from?  (You're not branching or checking out
>> *from* a lightweight checkout, are you?  That case wasn't handled by
>> that version.)
> 
> I'm not currently. Firstly, --files-from is not in 1.0 (right?) and I
> want the benchmark to be useful there, in the short term at least.
> Secondly, I can't really think of a Use Case for making a local checkout
> (even though I'm benchmarking it).

Me.  I use local checkouts almost exclusively.  I have a tree-less repo
at ~/bzrrepo, but I keep all my checkouts in ~/bzr, because that appeals
to  my notions of cleanliness.  When I'm done working on a tree, I can
just erase it.  So my ~/bzr has about 16 checkouts in it, while my repo
has 111 branches in it.

I really can't imagine doing it any other way.

When I create a new branch, I use "bzr cbranch bzr.dev FOO", which is
equivalent to:
$ bzr branch ~/bzrrepo/others/bzr.dev ~/bzrrepo/FOO
$ bzr checkout ~/bzrrepo/FOO

I'd like to change that to:
$ bzr branch ~/bzrrepo/others/bzr.dev ~/bzrrepo/FOO
$ bzr checkout ~/bzrrepo/FOO --files-from bzr.dev

> If there is one, I'm ok with adding
> the option but otherwise, I'd prefer the local checkout benchmark to be
> a lower bound of the common remote checkout case. Hmm - I guess you
> could easily have a few remote checkouts in the one repo and genuinely
> apply --files-from then?

I'm not really following you here.

> Real shame we can't auto-detect a useful tree
> though like you must be doing for branch ...

We can, but for people who are using local checkouts the way I do, it
won't do anything.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHaeNJ0F+nu1YWqI0RAjj4AJ9zh/AKC36L3BuSBPDMpSCMTv0npgCfQyaI
eicJWTMQrZpVZccOkN/OREA=
=ZQn6
-----END PGP SIGNATURE-----



More information about the bazaar mailing list