Accelerate build_tree by using working tree files
Aaron Bentley
aaron.bentley at utoronto.ca
Thu Dec 20 03:36:41 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Ian Clatworthy wrote:
> Aaron Bentley wrote:
>> Ian Clatworthy wrote:
>
>>> * creating a local branch: 63.9 -> 33.1
>> Could you send me the callgrind.out for that? I would have expected
>> better. This is still using a shared-repo, right? It's barely reading
>> or writing any files, so what's it doing?
>
> Attached (compressed).
Thank you.
So the bulk of the time is spent in build_tree. Nearly equal time is
spent in _iter_files_bytes_accelerated, which retrieves the file
contents, and TreeTransform.apply, which moves the files into place.
Third place goes to TreeTransform.create_file, which writes the files to
disk.
I think these are sub-optimal results, because AFAICT, you're running
against a dirty dirstate. _iter_changes should have no reason to call
os.sha_file_by_name, because it should all be in dirstate. But in fact,
it's being called 37 667 times!
Possible options for reducing time further:
- - make a version of Tree._iter_changes (17%) that does less work,
faster.
- - There are some write-locked operations that probably should be
investigated. I suspect they may be related to dirstate<=>inventory
translation.
- - set_parent_trees (9%) doesn't seem to be able to accelerate using a
DirStateRevisionTree-- we can probably do something more efficient.
- - apply_inventory_delta (7%) could probably be optimized with a native
dirstate implementation.
>>> Note that checkout and lightweight checkout performance drop slightly
>>> though: 64 -> 66 seconds.
>> Are you using --files-from? (You're not branching or checking out
>> *from* a lightweight checkout, are you? That case wasn't handled by
>> that version.)
>
> I'm not currently. Firstly, --files-from is not in 1.0 (right?) and I
> want the benchmark to be useful there, in the short term at least.
> Secondly, I can't really think of a Use Case for making a local checkout
> (even though I'm benchmarking it).
Me. I use local checkouts almost exclusively. I have a tree-less repo
at ~/bzrrepo, but I keep all my checkouts in ~/bzr, because that appeals
to my notions of cleanliness. When I'm done working on a tree, I can
just erase it. So my ~/bzr has about 16 checkouts in it, while my repo
has 111 branches in it.
I really can't imagine doing it any other way.
When I create a new branch, I use "bzr cbranch bzr.dev FOO", which is
equivalent to:
$ bzr branch ~/bzrrepo/others/bzr.dev ~/bzrrepo/FOO
$ bzr checkout ~/bzrrepo/FOO
I'd like to change that to:
$ bzr branch ~/bzrrepo/others/bzr.dev ~/bzrrepo/FOO
$ bzr checkout ~/bzrrepo/FOO --files-from bzr.dev
> If there is one, I'm ok with adding
> the option but otherwise, I'd prefer the local checkout benchmark to be
> a lower bound of the common remote checkout case. Hmm - I guess you
> could easily have a few remote checkouts in the one repo and genuinely
> apply --files-from then?
I'm not really following you here.
> Real shame we can't auto-detect a useful tree
> though like you must be doing for branch ...
We can, but for people who are using local checkouts the way I do, it
won't do anything.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHaeNJ0F+nu1YWqI0RAjj4AJ9zh/AKC36L3BuSBPDMpSCMTv0npgCfQyaI
eicJWTMQrZpVZccOkN/OREA=
=ZQn6
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list