Fastest way to find repository heads?
John Arbash Meinel
john at arbash-meinel.com
Thu Jul 9 00:37:58 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> John Arbash Meinel wrote:
...
>> I'll mention that KnownGraph is probably a lot better about this. So doing:
>
>> all_revs = repo.all_revision_ids()
>> parent_map = repo.get_parent_map(all_revs)
>> graph = graph.KnownGraph(parent_map)
>> heads = graph.heads(all_revs)
>
>> John
>> =:->
>
> KnownGraph should be O(N) while Graph is probably N^2 in places.
> However, for what you are doing I just realized:
>
> all_revs = repo.all_revision_ids()
> parent_map = repo.get_parent_map(all_revs)
> all_parents = set()
> map(all_parents.update, parent_map.itervalues())
> heads = set(all_revs) - all_parents
>
> That should be the fastest, and still give correct answers.
To give some numbers:
330ms all_revs = repo.all_revision_ids()
1700ms parent_map = repo.get_parent_map(all_revs)
g = graph.KnownGraph(parent_map)
140ms heads = g.heads(all_revs)
versus
g = repo.get_graph()
18400ms g.heads(all_revs)
versus
parent_map = repo.get_parent_map(all_revs)
all_parents = set()
map(all_parents.update, parent_map.itervalues())
30ms heads = set(all_revs) - all_parents
I'll note that in the 'fast' case, the slowest thing is
'get_parent_map()'. We could actually make a "get_all_parent_map()" that
would be as fast as all_revision_ids(), we just haven't had a need.
(all_revision_ids reads the whole btree and throws away the info,
because we don't do any caching in BTreeIndex.iter_all_entries().)
get_parent_map() is slower than all_revision_ids() because it does the
same work *plus* filtering the lists into subsets based on the btree
index, etc.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkpVLdYACgkQJdeBCYSNAANmywCgvmzva2dhoEx5ZEfx7tF3nHbV
FGEAnRIoKQ7KUXiVGbNNNdbNV9sjE89H
=oPgN
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list