[RFC/PLEASETEST] Per-file graph heads detection during commit for pack repositories.
John Arbash Meinel
john at arbash-meinel.com
Wed Nov 14 17:09:53 GMT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Robert Collins wrote:
> This is a call for testing.
>
...
>
> I think we can make it really fast; partly I want to fix the index
> layer, and graph.heads() still makes way to many calls into the index
> layer which means investment there has a high multiplier for
> performance.
>
> Another thing to try though, is to use the per-file graph again, this is
> hopefully less graph distance to process, at a tradeoff of reading the
> per-file graph data again (which could actually be less data overall in
> some situations).
>
> So the attached patch (not complete - theres a new public api without
> enough tests) - speeds up commit a bit in this case by going for the
> per-file graph approach.
>
In my quick test with a merge of bzr.dev into a feature branch, it made a huge
difference. I ran it under --lsprof and have attached the callgrinds. But
'time' said the difference (under --lsprof) was
bzr commit -m "test" 57.98s user 15.43s system 99% cpu 1:13.42 total
versus
bzr commit -m "test" 2.76s user 0.54s system 100% cpu 3.291 total
Yes, that is 73s versus 3.3s.
I also made sure to run the first commit 2 times, just to get some of the
indexes loaded into memory.
The callgrind seems to put 98% of the time in bzrlib.graph.heads(). I'm
actually thinking that part of the problem is that we don't cache any of the
graph information, so every call needs to go all the way down and bisect
through the index data. On that hunch, I went ahead and put together this patch:
=== modified file 'bzrlib/repofmt/pack_repo.py'
- --- bzrlib/repofmt/pack_repo.py 2007-11-13 21:39:37 +0000
+++ bzrlib/repofmt/pack_repo.py 2007-11-14 17:01:10 +0000
@@ -1510,6 +1510,7 @@
self._reconcile_does_inventory_gc = True
self._reconcile_fixes_text_parents = False
self._reconcile_backsup_inventory = False
+ self._revision_parents = {}
def _abort_write_group(self):
self._pack_collection._abort_write_group()
@@ -1571,17 +1572,23 @@
self._pack_collection.ensure_loaded()
index = self._pack_collection.revision_index.combined_index
search_keys = set()
+ found_parents = {_mod_revision.NULL_REVISION:[]}
for revision_id in revision_ids:
if revision_id != _mod_revision.NULL_REVISION:
- - search_keys.add((revision_id,))
- - found_parents = {_mod_revision.NULL_REVISION:[]}
- - for index, key, value, refs in index.iter_entries(search_keys):
- - parents = refs[0]
- - if not parents:
- - parents = (_mod_revision.NULL_REVISION,)
- - else:
- - parents = tuple(parent[0] for parent in parents)
- - found_parents[key[0]] = parents
+ if revision_id in self._revision_parents:
+ found_parents[revision_id] = \
+ self._revision_parents[revision_id]
+ else:
+ search_keys.add((revision_id,))
+ if search_keys:
+ for index, key, value, refs in index.iter_entries(search_keys):
+ parents = refs[0]
+ if not parents:
+ parents = (_mod_revision.NULL_REVISION,)
+ else:
+ parents = tuple(parent[0] for parent in parents)
+ found_parents[key[0]] = parents
+ self._revision_parents[key[0]] = parents
result = []
for revision_id in revision_ids:
try:
And for that, I ended up with:
bzr commit -m "test" 12.51s user 3.41s system 99% cpu 15.928 total
So the per-file graph is still better in this particular case. (3s versus 16s
versus 73s). But caching the parents makes a big difference. Which also
indicates to me that grabbing the revision parents is a bit too expensive in
general. At the very least, shouldn't the index layer be caching this sort of
thing? (at least caching the pages it has had to read so far)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHOyvgJdeBCYSNAAMRAkpaAJwNZrNmk2Fcke9o32ABDBWUHtE3rQCg1pNR
Cx44gtwVb80vK2rWycHhyJ4=
=C2+/
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: commit_merge_robert.callgrind.bz2
Type: application/octet-stream
Size: 31196 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071114/a96ca103/attachment-0003.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: commit_merge_dev.callgrind.bz2
Type: application/octet-stream
Size: 31251 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071114/a96ca103/attachment-0004.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: commit_merge_parent_cache.callgrind.bz2
Type: application/octet-stream
Size: 31295 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20071114/a96ca103/attachment-0005.obj
More information about the bazaar
mailing list