RFC get_revision_graph

John Arbash Meinel john at arbash-meinel.com
Thu Feb 21 18:55:10 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Aaron Bentley wrote:
| John Arbash Meinel wrote:
|> The way it is implemented in packs (ATM) is that for each node, you
|> query the
|> index for the parents of the node. Then you query the index again for the
|> parents to see if they are present (so you can remove ghosts). Then you
|> step and
|> do the same thing again.
|
| Couldn't we just avoid the second query?  When we step, we'll quickly
| find out which of the parents are present anyhow.

Well, it means you need to delay returning the child until you have stepped to
the parents. And it is also somewhat how the api is defined. Specifically,
Repository is doing something like:

weave = self._get_revision_vf()
pending = set([version_id])
graph = {}
while pending:
~  next = pending.pop()
~  parent_ids = weave.get_parents(next)
~  graph[next] = parent_ids
~  pending.update(p for p in parent_ids if p not in graph)

And the problem is that get_parents() needs to know whether something is a ghost
or not.

Alternatively we would have to keep a backwards map. So that when we step to a
parent which ends up being a ghost, we can go back and remove it from the
revision graph dictionary.

Which is why I was saying we either need to move building the graph down a
level, or move filtering ghosts up a level.

Or switch to an api that doesn't care about ghosts.

|
|> Which means that you query the index at least 2x for every node (even
|> more for
|> nodes that have a lot of children since the parent nodes get queried
|> each time).
|> Also, the index is written to not cache the parent information, so it
|> requires
|> bisecting into the in-memory nodes again. We have a
|> CachingParentProvider, but
|> that works at a higher level.
|
|> I thought there might be 2 possible resolutions.
|
|> 1) Redefine (or create a new api) that gives the revision graph, only it
|> includes ghosts. That way you don't have to check if parents exist.
|
|> 2) Push the revision_graph down a level, or push the ghost filtering up
|> a level.
|
| gannotate uses the revision graph solely for generating revnos.  It
| might be better to implement tsort.merge_sort on top of Graph.  That
| would mean more direct use of the Graph API, so we wouldn't necessarily
| need caching, but it would also allow us to use caching if necessary.
|
| We hope to optimize revno generation so that it doesn't require
| whole-history operations, so having it already using Graph is a step in
| the right direction.
|
| Aaron

Well, I think we can put that into our existing
Branch.get_revision_id_to_revno_map() and not have gannotate need to know the
details.

And as we get to that point, we can make the map a lazy evaluator.

Still, we use get_revision_graph (or its equivalent) fairly often.

Also, I think gannotate uses the wrong numbering for lines. Specifically it
regenerates the numbers based on the current revision, rather than always using
the branch tip revision. Which means when you jump to a node your numbers will
change. (Going to the text for 1234.5.67 will change the numbers to 1257, which
will give different results from 'bzr log -r 1257')

The other benefit of using branch.get_revision_id_to_revno_map is that it only
has to be computed 1 time, regardless of how much ancestry you traverse into.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHvckOJdeBCYSNAAMRAlMqAKCgauj9GbgcXqkA0zsBtsSyELI7eACfXss2
9mAW3/J7ht87Xyldp1UR3VU=
=C0o3
-----END PGP SIGNATURE-----



More information about the bazaar mailing list