Fastest way to find repository heads?
John Arbash Meinel
john at arbash-meinel.com
Thu Jul 9 00:06:06 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
John Arbash Meinel wrote:
> Jelmer Vernooij wrote:
>> When talking to a git smart server and trying to figure out which
>> revisions to fetch, bzr-git has to provide the server with all of the
>> heads in the repository so that the client and the server can negotiate
>> which revisions have to be transferred.
>
>> What is the best way of doing this? At the moment I'm using this:
>
>> heads = repo.get_graph().heads(repo.all_revision_ids())
>
>> But this is taking quite some time for large repositories, is there
>> perhaps a more efficient way of doing this?
>
>> Alternatively, I guess the repository could store this information, but
>> that would obviously require a new repository format.
>
>> Cheers,
>
>> Jelmer
>
> I'll mention that KnownGraph is probably a lot better about this. So doing:
>
> all_revs = repo.all_revision_ids()
> parent_map = repo.get_parent_map(all_revs)
> graph = graph.KnownGraph(parent_map)
> heads = graph.heads(all_revs)
>
> John
> =:->
KnownGraph should be O(N) while Graph is probably N^2 in places.
However, for what you are doing I just realized:
all_revs = repo.all_revision_ids()
parent_map = repo.get_parent_map(all_revs)
all_parents = set()
map(all_parents.update, parent_map.itervalues())
heads = set(all_revs) - all_parents
That should be the fastest, and still give correct answers.
(A head being any revision that is not referenced as a parent of some
other revision.)
The Graph objects do more work because they are trying to count without
walking every node. But since you already *have* every node, there is no
benefit.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkpVJl4ACgkQJdeBCYSNAAN/9wCgvoDSbBG3ciMJZ7hUHDap9N0k
1uMAn1XgZPlO31/Lf7msGIctmlHOodCi
=LSYS
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list