[MERGE] Repository.get_revision_graph improvements
John Arbash Meinel
john at arbash-meinel.com
Mon Mar 10 15:42:50 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Aaron Bentley wrote:
| John Arbash Meinel wrote:
|> 2) Introduce Graph.iter_ancestry() which returns similar information.
|
| bb:resubmit
|
| I would be more comfortable with iter_ancestry if there was a
| distinction made between revisions with no parents and ghost revisions.
| The obvious idea would be for ghosts to have None as their parent list.
|
| It can be useful to detect ghost revisions, and your use case,
| get_revision_graph, already does this. I think using None would help
| clarify this, while avoiding special-casing NULL_REVISION.
|
| Another use case for ghosts would be looking for ghost revisions in a
| different repository, i.e. stacking.
|
Stacking is already handled underneath Graph objects. Specifically you give it a
_StackedParentsProvider. Repository.get_graph() already has that function.
As I don't really care between () or None, I'll just switch to None, though.
|
|> === modified file 'bzrlib/graph.py'
|> --- bzrlib/graph.py 2008-02-03 22:55:08 +0000
|> +++ bzrlib/graph.py 2008-02-25 22:41:36 +0000
|> @@ -424,6 +424,37 @@
|> raise errors.NoCommonAncestor(left_revision, right_revision)
|> revisions = lca
|
|> + def iter_ancestry(self, revision_ids):
|> + """Iterate the ancestry of this revision.
|> +
|> + The specific order is undefined, but children should be returned before
|> + parents.
|
| ^^^ I don't think this is necessarily true; what about history shortcuts?
|
I'll just take it out. Certainly we aren't guaranteeing topologically sorted output.
|> === modified file 'bzrlib/repofmt/pack_repo.py'
|> --- bzrlib/repofmt/pack_repo.py 2008-02-19 03:58:32 +0000
|> +++ bzrlib/repofmt/pack_repo.py 2008-02-26 00:03:15 +0000
|> @@ -1926,6 +1926,53 @@
|> found_parents[key[0]] = parents
|> return found_parents
|
|> + @needs_read_lock
|> + def get_revision_graph(self, revision_id=None):
|> + """Return a dictionary containing the revision graph.
|> +
|> + :param revision_id: The revision_id to get a graph from. If None, then
|> + the entire revision graph is returned. This is a deprecated mode of
|> + operation and will be removed in the future.
|> + :return: a dictionary of revision_id->revision_parents_list.
|> + """
|> + if 'evil' in debug.debug_flags:
|> + mutter_callsite(3,
|> + "get_revision_graph scales with size of history.")
|> + # special case NULL_REVISION
|> + if revision_id == _mod_revision.NULL_REVISION:
|> + return {}
|> + a_weave = self._get_revision_vf()
|
| Urgh, why are you calling this a weave?
Because it was generally called that in other places. I've switched to
revision_vf. Nothing is very accurate since it is technically a "knit adapter"
on top of a pack file. (You aren't getting a physical knit or pack object.)
|
| And since you're going to use a Graph in a minute, why not just use
| Graph.get_parents_map to find out if the revision is present? At least
| you have a chance of caching that way.
Actually, I needed the VF for the case where no revision_id was supplied. I
don't believe Graph exposes any way to get "all revisions" you always have to
have an entry point. And since I already had one...
Anyway, I've changed it as you suggested. To make grabbing the VF the special
case, only when revision_id is None, and then to use get_parent_map() to check
for existence.
|> + if revision_id is None:
|> + return a_weave.get_graph()
|> + if revision_id not in a_weave:
|> + raise errors.NoSuchRevision(self, revision_id)
|
| ^^^ Since it is expensive to determine whether revision_id is in the
| "weave", it's probably worth avoiding the test here. You can get this
| from the iter_ancestry output (if there are no results, revision_id was
| not present in this weave/graph.
I just switched it to a "get_parent_map()" call. It is slightly redundant, but
the answer will be cached for Pack repositories.
|
|> + else:
|> + g = self.get_graph()
|> + ancestry = {}
|> + children = {}
|> + ghosts = set()
|> + NULL_REVISION = _mod_revision.NULL_REVISION
|> + for rev_id, parent_ids in g.iter_ancestry([revision_id]):
|> + if len(parent_ids) == 0: # Ghost
|> + if rev_id not in children:
|> + continue
|> + ghosts.add(rev_id)
|> + for child in children[rev_id]:
|> + old_anc = ancestry[child]
|> + ancestry[child] = tuple(p for p in old_anc
|> + if p != rev_id)
|> + continue
|> + if len(parent_ids) == 1 and parent_ids[0] == NULL_REVISION:
|> + parent_ids = () # No parents
|
| ^^^ Do you need special handling of NULL_REVISION here? It looks like
| the ghost handling would do it.
|
| Aaron
I can consider NULL_REVISION to be a ghost, and then it will be automatically
filtered out. Mostly that was the "adapter" between how get_parent_map() returns
(NULL_REVISION,) instead of () for a node with 0 parents, and adapting it back
to () for what get_revision_graph() expected.
In person, you also mentioned that I could remove the double-loop by changing
how I handle ghosts. Specifically, right now I check as I'm inserting if a
parent is a ghost, and then when I find a revision is a ghost, I go through and
remove the children I have already seen that referenced it.
I did go ahead and rewrite it to work that way. In thinking about it, it should
actually perform a bit better. As then I'm not filtering at every step, but just
passing through the data, and then filtering at the end. So I should filter
approx O(ghosts), rather than O(ancestry).
I believe the attached patch addresses all of your statements.
I did find out that I had to start tracking down the deprecations of
"get_revision_graph_with_ghosts." Which exposed that MultipleRevisionSources was
using it. However, none of our code was actually using MultipleRevisionSources
anymore. It was used in the old common_ancestor() code as an adapter between
multiple repositories. But now all of that has been moved onto Graph, and
Repository.get_graph(other_repo=) handles repository stacking.
So because it wasn't used anymore, I went ahead and just deprecated
MultipleRevisionSources and some of the other 'revision.py' functions that
weren't used and updated all the tests to follow.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFH1Vb6JdeBCYSNAAMRApADAKCCB7S2euAllkWZYxEcMXYtPjhoHgCgqsnu
sgR2coTIL6ehuZ2lVCLLEGA=
=rNTy
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: revision_graph.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20080310/62110645/attachment-0001.diff
More information about the bazaar
mailing list