Rev 2206: Merge GraphDelta support. in file:///home/robertc/source/baz/hpss-remotegraph/

Mon Apr 9 23:19:47 BST 2007

At file:///home/robertc/source/baz/hpss-remotegraph/

------------------------------------------------------------
revno: 2206
revision-id: robertc at robertcollins.net-20070409221942-d0t2lemx16nyn2ro
parent: robertc at robertcollins.net-20070405093526-bfbpg82tlc7kldni
parent: robertc at robertcollins.net-20070403074302-fo3x80f961hh7e7f
committer: Robert Collins <robertc at robertcollins.net>
branch nick: hpss-remotegraph
timestamp: Tue 2007-04-10 08:19:42 +1000
message:
  Merge GraphDelta support.
added:
  bzrlib/tests/repository_implementations/test_revision_graph.py test_revision_graph.-20070331011036-7x4dtgs5sopm7kcq-1
  doc/design/                    design-20070401020531-rqa7d77i22idnj78-1
  doc/design/server.txt          server.txt-20070401020531-rqa7d77i22idnj78-2
modified:
  bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
  bzrlib/graph.py                graph.py-20050905070950-b47dce53236c5e48
  bzrlib/revision.py             revision.py-20050309040759-e77802c08f3999d5
  bzrlib/tests/repository_implementations/__init__.py __init__.py-20060131092037-9564957a7d4a841b
  bzrlib/tests/repository_implementations/test_repository.py test_repository.py-20060131092128-ad07f494f5c9d26c
  bzrlib/tests/test_errors.py    test_errors.py-20060210110251-41aba2deddf936a8
  bzrlib/tests/test_graph.py     testgraph.py-20050905070950-42e6c958106610fd
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.12
    merged: robertc at robertcollins.net-20070403074302-fo3x80f961hh7e7f
    parent: robertc at robertcollins.net-20070402072850-a7b90xg48rdujr63
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Tue 2007-04-03 17:43:02 +1000
    message:
      Make Graph.get_descendants be lazy, and return the internal reference to avoid copy-overhead. Also increase GraphDelta performance.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.11
    merged: robertc at robertcollins.net-20070402072850-a7b90xg48rdujr63
    parent: robertc at robertcollins.net-20070401113531-onqhpzb37lexq17a
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Mon 2007-04-02 17:28:50 +1000
    message:
      Teach GraphDelta to handle ghosts.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.10
    merged: robertc at robertcollins.net-20070401113531-onqhpzb37lexq17a
    parent: robertc at robertcollins.net-20070401100816-j0298tbd10xjsguv
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 21:35:31 +1000
    message:
      Implement a NthNodeSelector.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.9
    merged: robertc at robertcollins.net-20070401100816-j0298tbd10xjsguv
    parent: robertc at robertcollins.net-20070401073652-xhjw2gmfgjsv4tvi
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 20:08:16 +1000
    message:
      Finish unit tests for creating and applying GraphDelta's, fixing a bug with prefix-deltas, and update the walkthroughts in the server design documentation.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.8
    merged: robertc at robertcollins.net-20070401073652-xhjw2gmfgjsv4tvi
    parent: robertc at robertcollins.net-20070401073344-5seq59fommjsi374
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 17:36:52 +1000
    message:
      Test GraphDeltas to the end of a graph with multiple tails when the subgraph is partitioned.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.7
    merged: robertc at robertcollins.net-20070401073344-5seq59fommjsi374
    parent: robertc at robertcollins.net-20070401073118-vixkw83zk2plkkwz
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 17:33:44 +1000
    message:
      Test GraphDeltas to the end of a graph with multiple tails.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.6
    merged: robertc at robertcollins.net-20070401073118-vixkw83zk2plkkwz
    parent: robertc at robertcollins.net-20070401071322-2wi0ecpdtf0jrjm0
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 17:31:18 +1000
    message:
      Test GraphDeltas on multiple-step internal-subgraphs.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.5
    merged: robertc at robertcollins.net-20070401071322-2wi0ecpdtf0jrjm0
    parent: robertc at robertcollins.net-20070401063117-71od8scu0te38npy
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 17:13:22 +1000
    message:
      Basic graph delta support functional, using a new graph get_heads method.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.4
    merged: robertc at robertcollins.net-20070401063117-71od8scu0te38npy
    parent: robertc at robertcollins.net-20070401061938-xzhb1m77i7fzzkb6
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 16:31:17 +1000
    message:
      Tighten the algorithm requirements - tails are not required because the exclusion on misses will converge. Also add Nonempty to empty test.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.3
    merged: robertc at robertcollins.net-20070401061938-xzhb1m77i7fzzkb6
    parent: robertc at robertcollins.net-20070401021622-c63kkszulpt30wae
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 16:19:38 +1000
    message:
      Start work on GraphDelta support.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.2
    merged: robertc at robertcollins.net-20070401021622-c63kkszulpt30wae
    parent: robertc at robertcollins.net-20070331011121-inbkb9pi7lmgbfhs
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sun 2007-04-01 12:16:22 +1000
    message:
      Sketch out finding missing revisions in client server mode.
    ------------------------------------------------------------
    revno: 2018.1.2.1.50.2.80.1.99.1.9.1.21.1.26.2.74.2.2.2.6.2.1
    merged: robertc at robertcollins.net-20070331011121-inbkb9pi7lmgbfhs
    parent: pqm at pqm.ubuntu.com-20070329191009-2b533e883212576c
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: netsim
    timestamp: Sat 2007-03-31 11:11:21 +1000
    message:
      Split get_revision_graph tests out of repository_implementations.test_repository.
=== added file 'bzrlib/tests/repository_implementations/test_revision_graph.py'

--- a/bzrlib/tests/repository_implementations/test_revision_graph.py	1970-01-01 00:00:00 +0000
+++ b/bzrlib/tests/repository_implementations/test_revision_graph.py	2007-03-31 01:11:21 +0000
@@ -0,0 +1,80 @@
+# Copyright (C) 2007 Canonical Ltd
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+"""Tests for revision graph access."""
+
+from bzrlib.errors import (NoSuchRevision,
+                           )
+from bzrlib.revision import NULL_REVISION
+from bzrlib.tests.repository_implementations.test_repository import (
+    TestCaseWithComplexRepository,
+    )
+
+
+class TestRevisionGraph(TestCaseWithComplexRepository):
+
+    def test_get_revision_graph(self):
+        # we can get a mapping of id->parents for the entire revision graph or bits thereof.
+        self.assertEqual({'rev1':[],
+                          'rev2':['rev1'],
+                          'rev3':['rev2'],
+                          'rev4':['rev3'],
+                          },
+                         self.bzrdir.open_repository().get_revision_graph(None))
+        self.assertEqual({'rev1':[]},
+                         self.bzrdir.open_repository().get_revision_graph('rev1'))
+        self.assertEqual({'rev1':[],
+                          'rev2':['rev1']},
+                         self.bzrdir.open_repository().get_revision_graph('rev2'))
+        self.assertRaises(NoSuchRevision,
+                          self.bzrdir.open_repository().get_revision_graph,
+                          'orphan')
+        # and ghosts are not mentioned
+        self.assertEqual({'rev1':[],
+                          'rev2':['rev1'],
+                          'rev3':['rev2'],
+                          },
+                         self.bzrdir.open_repository().get_revision_graph('rev3'))
+        # and we can ask for the NULLREVISION graph
+        self.assertEqual({},
+            self.bzrdir.open_repository().get_revision_graph(NULL_REVISION))
+
+    def test_get_revision_graph_with_ghosts(self):
+        # we can get a graph object with roots, ghosts, ancestors and
+        # descendants.
+        repo = self.bzrdir.open_repository()
+        graph = repo.get_revision_graph_with_ghosts([])
+        self.assertEqual(set(['rev1']), graph.roots)
+        self.assertEqual(set(['ghost1', 'ghost2']), graph.ghosts)
+        self.assertEqual({'rev1':[],
+                          'rev2':['rev1'],
+                          'rev3':['rev2', 'ghost1'],
+                          'rev4':['rev3', 'ghost1', 'ghost2'],
+                          },
+                          graph.get_ancestors())
+        self.assertEqual({'ghost1':{'rev3':1, 'rev4':1},
+                          'ghost2':{'rev4':1},
+                          'rev1':{'rev2':1},
+                          'rev2':{'rev3':1},
+                          'rev3':{'rev4':1},
+                          'rev4':{},
+                          },
+                          graph.get_descendants())
+        # and we can ask for the NULLREVISION graph
+        graph = repo.get_revision_graph_with_ghosts([NULL_REVISION])
+        self.assertEqual({}, graph.get_ancestors())
+        self.assertEqual({}, graph.get_descendants())
+

=== added directory 'doc/design'
=== added file 'doc/design/server.txt'
--- a/doc/design/server.txt	1970-01-01 00:00:00 +0000
+++ b/doc/design/server.txt	2007-04-01 10:08:16 +0000
@@ -0,0 +1,126 @@
+====================
+Bazaar Server Design
+====================
+
+VFS Layer
+=========
+
+High-performance server
+=======================
+
+Determining missing data between client and server.
+
+Basic idea:
+Perform a recursive graph (DAG) difference algorithm incrementally.
+
+Some tools are needed. The first is 'graph_delta_description'. This generates a
+minimal description of a graph relative to a supergraph. Sketching this out: we
+could use a number of different tools to describe this: nodes to start at; a
+count of paths to avoid (if paths are numbered repeatable), nodes to search
+from, nodes to finish at; nodes that should not be included; edges to skip, and
+more. If the description will be larger than the size of the node names in the
+graph, an error should be raised.
+The functional requirements of the delta description are:
+ * It can be applied by the generator to the supergraph to recreate the graph,
+   even if the supergraph has new nodes added; once added a node and the paths
+   from it are immutable.
+ * The description can be serialised.
+The initial implementation will simply list the nodes to start from, and nodes
+that should not be traversed to. This can represent any arbitrary subgraph of
+the supergraph and does not require complex cutting logic.
+
+The second tool is 'node_sample_selection'. This takes a graph and returns a
+selection of nodes from within it along with a short token. This selection must
+have the following properties:
+ * It is repeatable given the token and the graph.
+ * The token must be short(< the size of a node) and serialisable.
+
+Constraints:
+ * The client is permitted state.
+ * The server is not permitted to hold state about a particular client across requests.
+
+Client side:
+To generate a list of the server nodes which the client does not have, the
+client asks the server for a graph_delta_description and a
+node_sample_selection. In asking for this the client provides the most recent
+graph_delta_description, node_sample_selection token, a bitmap of presence
+flags in the node_sample_selection the server supplied. The first request
+clearly has nothing to supply. The client then loops around on this until the
+server supplies a 'completed' response with a graph_delta_description and no
+node_sample_selection: at this point the server has decided that the
+computation is complete.
+
+Server side:
+When a client asks for a graph_delta_description and node_sample_selection with
+no previous description or token & bitmap, the server should create a
+current_graph which equals the supergraph, and return that and a
+node_sample_selection.
+When the client asks with a previous graph_delta_description and
+node_sample_selection & bitmap, then the server infers as much as it can about
+the client data base from the presence bitmap. In doing this it colours its
+supergraph with several labels (each node can have only one label). Present -
+the client has the node. Missing - the client does not have the node. Unknown -
+the client may have the node. If the set of Unknown nodes is empty then the
+server sends a completed message along with a graph_delta_description for the
+Missing nodes. If the set of Unknown nodes is non-empty then the server sends a
+graph_delta_description for the Unknown subgraph(s) along with a
+node_sample_selection from that. The inference proceeds by:
+ 1. Mark the entire graph unknown.
+ 2. For every node reachable from the subgraph specified by the graph delta,
+    mark it Present (it was Present in the last iteration).
+ 2. For every node derived from the subgraph specified by the graph delta, mark
+    it Missing. (it was missing in the last iteration).
+ 3. For every present node the client supplied walk from it to its parents and
+    mark the transitive closure as Present.
+ 4. For every node the client did not mark as present, find its descendants and
+    mark them as Missing. An assertion may be made at this point that they were
+    not already marked 'Present' (this would indicate ghosts on the client).
+    Alternatively we may ignore a missing statement in the ancestry of a
+    present statement, or try to infer the size of the ghosted region.
+
+
+Worked example:
+
+Server Graph: {A:[]}
+Client Graph: {}
+
+Client message 1:
+get_missing()
+response:
+unknowns=(HEADS=[], STOPS=[])
+sample=(0, [A])
+Client message 2:
+get_missing((HEADS=[], STOPS=[]), (0, [False]))
+response:
+'complete', missing=(HEADS=[], STOPS=[])
+
+
+Server Graph: {A:[], B:[A], C:[A], D:[B, C]}
+Client Graph: {A:[], B:[A], E:[B]}
+
+Client message 1:
+get_missing
+response:
+unknown=(HEADS=[], STOPS=[])
+sample=(0, [A, D])
+client message 2:
+get_missing((HEADS=[], STOPS=[]), (0, [True, False]))
+response:
+unknown=(HEADS=[B, C], STOPS=[A])
+sample=(0, [B, C])
+client message 3:
+get_missing((HEADS=[B,C], STOPS=[A]), (0, [True, False]))
+response:
+'complete', missing=(HEADS=[], STOPS=[B, A])
+
+
+
+Discussion:
+ - could the client infer the cut points from the returned bitmap ? Might save a
+   little bw but will be problematic on the second roundtrip when the sent
+   nodes does not match the entire new cutlist.
+ - The algorithm can work symmetrically, so the client can provide its own
+   node_sample_selection in each roundtrip, halving the roundtrips required to
+   determine the missing revisions.
+ - Extremely well balanced graphs may lead to very wide tails lists for the
+   node_sample_selection and graph_delta_description routines.

=== modified file 'bzrlib/errors.py'
--- a/bzrlib/errors.py	2007-04-05 09:35:26 +0000
+++ b/bzrlib/errors.py	2007-04-09 22:19:42 +0000
@@ -266,6 +266,15 @@
         self.url = url
 
 
+class NotSubGraph(BzrError):
+
+    _fmt = "'%(to)s' is not a subgraph of '%(from_)s'."
+
+    def __init__(self, from_, to):
+        self.from_ = from_
+        self.to = to
+
+
 class WorkingTreeAlreadyPopulated(BzrError):
 
     _fmt = """Working tree already populated in %(base)s"""

=== modified file 'bzrlib/graph.py'
--- a/bzrlib/graph.py	2006-10-05 05:37:25 +0000
+++ b/bzrlib/graph.py	2007-04-03 07:43:02 +0000
@@ -15,6 +15,7 @@
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 
 
+from bzrlib import errors
 from bzrlib.tsort import topo_sort
 
 
@@ -116,27 +117,42 @@
         self.roots = set([])
         self.ghosts = set([])
         self._graph_ancestors = {}
-        self._graph_descendants = {}
+        self._graph_descendants = None
+
+    def __eq__(self, other):
+        if not isinstance(other, Graph):
+            return False
+        if self._graph_descendants and not other._graph_descendants:
+            self._generate_descendants()
+        return self.__dict__ == other.__dict__
 
     def add_ghost(self, node_id):
         """Add a ghost to the graph."""
         self.ghosts.add(node_id)
-        self._ensure_descendant(node_id)
+        if self._graph_descendants is not None:
+            self._graph_descendants.setdefault(node_id, {})
 
     def add_node(self, node_id, parent_ids):
         """Add node_id to the graph with parent_ids as its parents."""
         if parent_ids == []:
             self.roots.add(node_id)
         self._graph_ancestors[node_id] = list(parent_ids)
-        self._ensure_descendant(node_id)
-        for parent in parent_ids:
-            self._ensure_descendant(parent)
-            self._graph_descendants[parent][node_id] = 1
-        
-    def _ensure_descendant(self, node_id):
-        """Ensure that a descendant lookup for node_id will work."""
-        if not node_id in self._graph_descendants:
-            self._graph_descendants[node_id] = {}
+        if self._graph_descendants is not None:
+            self._graph_descendants.setdefault(node_id, {})
+            for parent in parent_ids:
+                self._graph_descendants.setdefault(parent, {})[node_id] = 1
+
+    def changes_from(self, graph_from):
+        """Create a GraphDelta from graph_from to self."""
+        return GraphDelta._changes_from(graph_from, self)
+
+    def _generate_descendants(self):
+        """Construct the _graph_descendants node if needed."""
+        self._graph_descendants = {}
+        for node_id in self._graph_ancestors:
+            self._graph_descendants.setdefault(node_id, {})
+            for parent in self._graph_ancestors[node_id]:
+                self._graph_descendants.setdefault(parent, {})[node_id] = 1
 
     def get_ancestors(self):
         """Return a dictionary of graph node:ancestor_list entries."""
@@ -159,4 +175,155 @@
 
     def get_descendants(self):
         """Return a dictionary of graph node:child_node:distance entries."""
-        return dict(self._graph_descendants.items())
+        if self._graph_descendants is None:
+            self._generate_descendants()
+        return self._graph_descendants
+
+    def get_heads(self):
+        """Determine the heads of the graph."""
+        return set([node for node, descendants in
+            self.get_descendants().items() if not descendants])
+
+    def __repr__(self):
+        return "Graph ancestors=%s, descendants=%r, ghosts=%r" % (
+            self._graph_ancestors, self.get_descendants(), self.ghosts)
+
+
+class GraphDelta(object):
+    """A delta between two DAG's. 
+
+    Currently this requires that the from DAG be a supergraph of the to DAG.
+    This restriction may be lifted in the future.
+
+    Public attributes: 
+    heads: A list of heads that the subgraph should start from. An empty list
+        means start from the heads of the from graph when applying the delta.
+    cut_from: A list of nodes to cut out of the output. An empty list means
+        cut nothing.
+    """
+
+    def __init__(self):
+        self.heads = set()
+        self.cut_from = set()
+
+    def __eq__(self, other):
+        return isinstance(other, GraphDelta) and self.__dict__ == other.__dict__
+
+    def apply_to(self, from_):
+        """Apply this delta to the graph from_.
+        
+        :return: A new graph derived from from_ and this delta.
+        """
+        result = Graph()
+        if self.heads:
+            pending = set(self.heads)
+        else:
+            pending = set(from_.get_heads())
+        cut_from = set(self.cut_from) # in case its a list
+        ancestors = from_.get_ancestors()
+        done = set()
+        while pending:
+            node = pending.pop()
+            done.add(node)
+            if node not in cut_from:
+                try:
+                    result.add_node(node, ancestors[node])
+                    pending.update(set(ancestors[node]).difference(done))
+                except KeyError:
+                    result.add_ghost(node)
+        return result
+
+    @classmethod
+    def _changes_from(cls, from_, to):
+        """Construct a delta from from_ to to."""
+        result = GraphDelta()
+        ancestors = from_.get_ancestors()
+        from_nodes = set(ancestors)
+        to_nodes = set(to.get_ancestors())
+        if to_nodes.difference(from_nodes):
+            raise errors.NotSubGraph(from_, to)
+        if to_nodes == from_nodes:
+            return result
+        to_heads = to.get_heads()
+        result.heads = to_heads
+        # if the heads list is equal to the from_ heads list, 
+        # omit it. XXX: this special casing may be a pointless
+        # optimisation.
+        if result.heads == from_.get_heads():
+            result.heads = set([])
+         
+        # start walking from all the nodes in to.
+        # when we hit a node not in to, add it to cut_from.
+        # when we hit a node in to, remove it from heads.
+        # discount ghosts in parent lists.
+        # special case, symmetrical with the apply_to use of the 
+        # source graph heads when the to graph is empty:
+        # we use the source graph heads as the cut list giving
+        # an instant empty graph.
+        if not to_nodes:
+            result.cut_from = from_.get_heads()
+            return result
+        pending = set(to_heads)
+        done = set()
+        while pending:
+            node = pending.pop()
+            done.add(node)
+            node_parents = ancestors[node]
+            for parent in node_parents:
+                if parent in from_.ghosts:
+                    continue
+                if parent not in to_nodes:
+                    result.cut_from.add(parent)
+                    continue
+                if parent in result.heads:
+                    result.heads.remove(parent)
+                if parent not in done:
+                    pending.add(parent)
+        # remove heads that 
+        return result
+
+    def __repr__(self):
+        return "GraphDelta heads=%r, cut_from=%r" % (self.heads, self.cut_from)
+
+
+class NthNodeSelector(object):
+    """A NodeSelector that selects event nth node chosen to meet a limit.
+    
+    This is almost certainly not a most-optimal node selector for smart server
+    use, but it is extremely simple to build and sufficient to test the concept.
+    """
+
+    def __init__(self, limit=1):
+        """Construct a NthNodeSelector.
+
+        :param limit: The total number of nodes that may be returned from
+            select on any graph. This count may exceed the total size of the
+            graph.
+        """
+        if limit < 1:
+            raise ValueError("limit (%r) must be >= 1" % (limit))
+        self.limit = limit
+
+    def select(self, graph):
+        """Select nodes from graph.
+        
+        The nodes chosen are those found by selecting every len(graph)/limit
+        node.
+        """
+        nodes = graph.get_ancestors().keys()
+        nodes.sort()
+        count = len(nodes)
+        interval = count/self.limit
+        if interval < 1:
+            # this performs badly when 2*limit >= count, as it selects only
+            # the first limit nodes: this can be addressed by using a 
+            # floating point interval, but its not done yet. TODO XXX
+            interval = 1
+        pos = int(interval * 0.5)
+        result = []
+        while pos < count:
+            if len(result) == self.limit:
+                return result
+            result.append(nodes[pos])
+            pos += interval
+        return result

=== modified file 'bzrlib/revision.py'
--- a/bzrlib/revision.py	2007-01-17 15:37:08 +0000
+++ b/bzrlib/revision.py	2007-04-03 07:43:02 +0000
@@ -253,7 +253,7 @@
                 [revision_a, revision_b])
             # convert to a NULL_REVISION based graph.
             ancestors = graph.get_ancestors()
-            descendants = graph.get_descendants()
+            descendants = dict(graph.get_descendants().items())
             common = set(graph.get_ancestry(revision_a)).intersection(
                      set(graph.get_ancestry(revision_b)))
             descendants[NULL_REVISION] = {}

=== modified file 'bzrlib/tests/repository_implementations/__init__.py'
--- a/bzrlib/tests/repository_implementations/__init__.py	2007-03-13 06:55:43 +0000
+++ b/bzrlib/tests/repository_implementations/__init__.py	2007-04-09 22:19:42 +0000
@@ -52,6 +52,7 @@
         'bzrlib.tests.repository_implementations.test_reconcile',
         'bzrlib.tests.repository_implementations.test_repository',
         'bzrlib.tests.repository_implementations.test_revision',
+        'bzrlib.tests.repository_implementations.test_revision_graph',
         'bzrlib.tests.repository_implementations.test_statistics',
         ]
 

=== modified file 'bzrlib/tests/repository_implementations/test_repository.py'
--- a/bzrlib/tests/repository_implementations/test_repository.py	2007-04-05 09:35:26 +0000
+++ b/bzrlib/tests/repository_implementations/test_repository.py	2007-04-09 22:19:42 +0000
@@ -506,6 +506,8 @@
     def setUp(self):
         super(TestCaseWithComplexRepository, self).setUp()
         tree_a = self.make_branch_and_tree('a')
+        # save the tree for later mutations
+        self.tree_a = tree_a
         self.bzrdir = tree_a.branch.bzrdir
         # add a corrupt inventory 'orphan'
         # this may need some generalising for knits.
@@ -524,6 +526,11 @@
         tree_a.add_parent_tree_id('ghost1')
         tree_a.add_parent_tree_id('ghost2')
         tree_a.commit('rev4', rev_id='rev4', allow_pointless=True)
+        # the resulting graph is:
+        # rev1: [], rev2:[rev1], rev3:[rev2, ghost1], rev4[rev3, ghost1, ghost2]
+
+
+class TestRevisionTrees(TestCaseWithComplexRepository):
 
     def test_revision_trees(self):
         revision_ids = ['rev1', 'rev2', 'rev3', 'rev4']
@@ -543,6 +550,9 @@
                    revisions]
         assert deltas1 == deltas2
 
+
+class TestAllRevisionIds(TestCaseWithComplexRepository):
+
     def test_all_revision_ids(self):
         # all_revision_ids -> all revisions
         self.assertEqual(['rev1', 'rev2', 'rev3', 'rev4'],
@@ -554,58 +564,6 @@
         self.assertRaises(errors.NoSuchRevision,
                           self.bzrdir.open_repository().get_ancestry, 'orphan')
 
-    def test_get_revision_graph(self):
-        # we can get a mapping of id->parents for the entire revision graph or bits thereof.
-        self.assertEqual({'rev1':[],
-                          'rev2':['rev1'],
-                          'rev3':['rev2'],
-                          'rev4':['rev3'],
-                          },
-                         self.bzrdir.open_repository().get_revision_graph(None))
-        self.assertEqual({'rev1':[]},
-                         self.bzrdir.open_repository().get_revision_graph('rev1'))
-        self.assertEqual({'rev1':[],
-                          'rev2':['rev1']},
-                         self.bzrdir.open_repository().get_revision_graph('rev2'))
-        self.assertRaises(NoSuchRevision,
-                          self.bzrdir.open_repository().get_revision_graph,
-                          'orphan')
-        # and ghosts are not mentioned
-        self.assertEqual({'rev1':[],
-                          'rev2':['rev1'],
-                          'rev3':['rev2'],
-                          },
-                         self.bzrdir.open_repository().get_revision_graph('rev3'))
-        # and we can ask for the NULLREVISION graph
-        self.assertEqual({},
-            self.bzrdir.open_repository().get_revision_graph(NULL_REVISION))
-
-    def test_get_revision_graph_with_ghosts(self):
-        # we can get a graph object with roots, ghosts, ancestors and
-        # descendants.
-        repo = self.bzrdir.open_repository()
-        graph = repo.get_revision_graph_with_ghosts([])
-        self.assertEqual(set(['rev1']), graph.roots)
-        self.assertEqual(set(['ghost1', 'ghost2']), graph.ghosts)
-        self.assertEqual({'rev1':[],
-                          'rev2':['rev1'],
-                          'rev3':['rev2', 'ghost1'],
-                          'rev4':['rev3', 'ghost1', 'ghost2'],
-                          },
-                          graph.get_ancestors())
-        self.assertEqual({'ghost1':{'rev3':1, 'rev4':1},
-                          'ghost2':{'rev4':1},
-                          'rev1':{'rev2':1},
-                          'rev2':{'rev3':1},
-                          'rev3':{'rev4':1},
-                          'rev4':{},
-                          },
-                          graph.get_descendants())
-        # and we can ask for the NULLREVISION graph
-        graph = repo.get_revision_graph_with_ghosts([NULL_REVISION])
-        self.assertEqual({}, graph.get_ancestors())
-        self.assertEqual({}, graph.get_descendants())
-
     def test_reserved_id(self):
         repo = self.make_repository('repository')
         self.assertRaises(errors.ReservedId, repo.add_inventory, 'reserved:',

=== modified file 'bzrlib/tests/test_errors.py'
--- a/bzrlib/tests/test_errors.py	2007-03-30 01:46:31 +0000
+++ b/bzrlib/tests/test_errors.py	2007-04-09 22:19:42 +0000
@@ -104,6 +104,10 @@
             str(error))
         self.assertIsInstance(error, errors.NoSuchRevision)
 
+    def test_not_sub_graph(self):
+        error = errors.NotSubGraph("from", "to")
+        self.assertEqualDiff("'to' is not a subgraph of 'from'.", str(error))
+
     def test_not_write_locked(self):
         error = errors.NotWriteLocked('a thing to repr')
         self.assertEqualDiff("'a thing to repr' is not write locked but needs "

=== modified file 'bzrlib/tests/test_graph.py'
--- a/bzrlib/tests/test_graph.py	2006-10-11 23:08:27 +0000
+++ b/bzrlib/tests/test_graph.py	2007-04-02 07:28:50 +0000
@@ -14,8 +14,14 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 
+from bzrlib.errors import NotSubGraph
 from bzrlib.tests import TestCase
-from bzrlib.graph import node_distances, nodes_by_distance, Graph
+from bzrlib.graph import (node_distances,
+    nodes_by_distance,
+    Graph,
+    GraphDelta,
+    NthNodeSelector,
+    )
 
 
 class TestBase(TestCase):
@@ -78,3 +84,384 @@
         # check the result contains ghosts:
         self.assertEqual({'ghost': {'rev1': 1}, 'rev1': {}},
                          graph.get_descendants())
+
+    def test_get_heads(self):
+        # graph objects allow querying for their heads. 
+        # test empty graph, a partitioned graph, and a graph
+        # with a unmerged branch
+        # empty
+        graph = Graph()
+        self.assertEqual(set(), graph.get_heads())
+        # simple
+        graph.add_node('A', [])
+        self.assertEqual(set(['A']), graph.get_heads())
+        graph.add_node('B', ['A'])
+        self.assertEqual(set(['B']), graph.get_heads())
+        # simple, partitioned
+        graph.add_node('C', [])
+        self.assertEqual(set(['B', 'C']), graph.get_heads())
+        graph.add_node('D', ['C'])
+        self.assertEqual(set(['B', 'D']), graph.get_heads())
+        # joined up parititons
+        graph.add_node('E', ['D', 'B'])
+        self.assertEqual(set(['E']), graph.get_heads())
+        # unmerged branch
+        graph.add_node('F', ['B'])
+        self.assertEqual(set(['E', 'F']), graph.get_heads())
+        # and merge it
+        graph.add_node('G', ['E', 'F'])
+        self.assertEqual(set(['G']), graph.get_heads())
+
+
+class TestGraphDelta(TestCase):
+
+    def check_delta(self, expected_delta, graph_from, graph_to):
+        """Check the delta from graph_from to graph_to.
+
+        This will calculate a new delta, check that applying it to graph_from
+        results in graph_to, and that it is equal to expected_delta.
+        """
+        delta = graph_to.changes_from(graph_from)
+        self.assertIsInstance(delta, GraphDelta)
+        self.assertEqual(expected_delta, delta)
+        new_to = delta.apply_to(graph_from)
+        self.assertEqual(graph_to, new_to)
+
+    def testEmptyEmptyGraph(self):
+        graph_from = Graph()
+        graph_to = Graph()
+        self.check_delta(GraphDelta(), graph_from, graph_to)
+
+    def testTargetNotSubGraph(self):
+        """When the target is not a sub-graph of the source, get an exception."""
+        graph_from = Graph()
+        graph_to = Graph()
+        graph_to.add_node('A', [])
+        self.assertRaises(NotSubGraph, graph_to.changes_from, graph_from)
+
+    def testSmallEmptyGraph(self):
+        graph_from = Graph()
+        graph_from.add_node('A', [])
+        graph_to = Graph()
+        result = GraphDelta()
+        # keeping the implied heads lets the graph-to-build be maintained
+        # outside the core routine; this is cleaner I think for now.
+        result.cut_from = set(['A'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testSmallIdentityGraph(self):
+        graph_from = Graph()
+        graph_from.add_node('A', [])
+        graph_to = Graph()
+        graph_to.add_node('A', [])
+        result = GraphDelta()
+        self.check_delta(result, graph_from, graph_to)
+
+    def testMiddleSegment(self):
+        # grabbing a delta from A->B->C->D->E to B->C->D:
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C'])
+        graph_from.add_node('C', ['D'])
+        graph_from.add_node('D', ['E'])
+        graph_from.add_node('E', [])
+        graph_to = Graph()
+        graph_to.add_node('B', ['C'])
+        graph_to.add_node('C', ['D'])
+        graph_to.add_node('D', ['E'])
+        result = GraphDelta()
+        result.heads = set(['B'])
+        result.cut_from = set(['E'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testMiddleSegmentWithMerges(self):
+        # grabbing a delta where the to graph is totally interior and contains embedded merges.
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C', 'D'])
+        graph_from.add_node('C', ['E'])
+        graph_from.add_node('D', ['E'])
+        graph_from.add_node('E', ['F'])
+        graph_from.add_node('F', [])
+        graph_to = Graph()
+        graph_to.add_node('B', ['C', 'D'])
+        graph_to.add_node('C', ['E'])
+        graph_to.add_node('D', ['E'])
+        graph_to.add_node('E', ['F'])
+        result = GraphDelta()
+        result.heads = set(['B'])
+        result.cut_from = set(['F'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testMiddleSegmentWithMergeStart(self):
+        # grabbing a delta where the to graph is totally interior and contains the start
+        # of a branch in the graph
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C', 'D'])
+        graph_from.add_node('C', ['E'])
+        graph_from.add_node('D', ['E'])
+        graph_from.add_node('E', ['F'])
+        graph_from.add_node('F', [])
+        graph_to = Graph()
+        graph_to.add_node('C', ['E'])
+        graph_to.add_node('D', ['E'])
+        graph_to.add_node('E', ['F'])
+        result = GraphDelta()
+        result.heads = set(['C', 'D'])
+        result.cut_from = set(['F'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testMiddleSegmentWithMergeEnd(self):
+        # grabbing a delta where the to graph is totally interior and contains the end
+        # of a branch in the graph
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C', 'D'])
+        graph_from.add_node('C', ['E'])
+        graph_from.add_node('D', ['E'])
+        graph_from.add_node('E', ['F'])
+        graph_from.add_node('F', [])
+        graph_to = Graph()
+        graph_to.add_node('B', ['C', 'D'])
+        result = GraphDelta()
+        result.heads = set(['B'])
+        result.cut_from = set(['C', 'D'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testMiddleSegmentWithPartitionedMerge(self):
+        # grabbing a delta where the to graph is totally interior and contains both
+        # sides of a branch and merge, but not the start or end.
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C', 'D'])
+        graph_from.add_node('C', ['E'])
+        graph_from.add_node('D', ['E'])
+        graph_from.add_node('E', ['F'])
+        graph_from.add_node('F', [])
+        graph_to = Graph()
+        graph_to.add_node('C', ['E'])
+        graph_to.add_node('D', ['E'])
+        result = GraphDelta()
+        result.heads = set(['C', 'D'])
+        result.cut_from = set(['E'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testMiddleSegmentWithLeftSideMerge(self):
+        # grabbing a delta where the to graph is totally interior and contains the left
+        # side of a branch and merge but not the right side.
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C', 'D'])
+        graph_from.add_node('C', ['E'])
+        graph_from.add_node('D', ['E'])
+        graph_from.add_node('E', ['F'])
+        graph_from.add_node('F', [])
+        graph_to = Graph()
+        graph_to.add_node('C', ['E'])
+        graph_to.add_node('E', ['F'])
+        result = GraphDelta()
+        result.heads = set(['C'])
+        result.cut_from = set(['F'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testMiddleSegmentWithRightSideMerge(self):
+        # grabbing a delta where the to graph is totally interior and contains the right
+        # side of a branch and merge but not the left side.
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C', 'D'])
+        graph_from.add_node('C', ['E'])
+        graph_from.add_node('D', ['E'])
+        graph_from.add_node('E', ['F'])
+        graph_from.add_node('F', [])
+        graph_to = Graph()
+        graph_to.add_node('D', ['E'])
+        graph_to.add_node('E', ['F'])
+        result = GraphDelta()
+        result.heads = set(['D'])
+        result.cut_from = set(['F'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testEndOfGraph(self):
+        # grabbing a delta from A->B->C->D->E,F to C->D->E,F:
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C'])
+        graph_from.add_node('C', ['D'])
+        graph_from.add_node('D', ['E', 'F'])
+        graph_from.add_node('E', [])
+        graph_from.add_node('F', [])
+        graph_to = Graph()
+        graph_to.add_node('C', ['D'])
+        graph_to.add_node('D', ['E', 'F'])
+        graph_to.add_node('E', [])
+        graph_to.add_node('F', [])
+        result = GraphDelta()
+        result.heads = set(['C'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testPartitionedEndOfGraph(self):
+        # grabbing a delta from A->B->C->D,E D->F, E->G to D->F and E->G :
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C'])
+        graph_from.add_node('C', ['D', 'E'])
+        graph_from.add_node('D', ['F'])
+        graph_from.add_node('E', ['G'])
+        graph_from.add_node('F', [])
+        graph_from.add_node('G', [])
+        graph_to = Graph()
+        graph_to.add_node('D', ['F'])
+        graph_to.add_node('E', ['G'])
+        graph_to.add_node('F', [])
+        graph_to.add_node('G', [])
+        result = GraphDelta()
+        result.heads = set(['D', 'E'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testStartOfGraph(self):
+        # grabbing a delta from A->B->C to A->B:
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_node('B', ['C'])
+        graph_from.add_node('C', [])
+        graph_to = Graph()
+        graph_to.add_node('A', ['B'])
+        graph_to.add_node('B', ['C'])
+        result = GraphDelta()
+        result.cut_from = set(['C'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testStartOfGraphMultipleHeads(self):
+        # grabbing a delta which grabs the top of a graph with multiple hedas.
+        graph_from = Graph()
+        graph_from.add_node('A1', ['B1'])
+        graph_from.add_node('A2', ['B2'])
+        graph_from.add_node('B1', ['C'])
+        graph_from.add_node('B2', ['C'])
+        graph_from.add_node('C', [])
+        graph_to = Graph()
+        graph_to.add_node('A1', ['B1'])
+        graph_to.add_node('A2', ['B2'])
+        graph_to.add_node('B1', ['C'])
+        graph_to.add_node('B2', ['C'])
+        result = GraphDelta()
+        result.cut_from = set(['C'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testStartOfPartitionedGraph(self):
+        # grabbing a delta which grabs the top of a partitioned graph.
+        graph_from = Graph()
+        graph_from.add_node('A1', ['B1'])
+        graph_from.add_node('A2', ['B2'])
+        graph_from.add_node('B1', ['C1'])
+        graph_from.add_node('B2', ['C2'])
+        graph_from.add_node('C1', [])
+        graph_from.add_node('C2', [])
+        graph_to = Graph()
+        graph_to.add_node('A1', ['B1'])
+        graph_to.add_node('A2', ['B2'])
+        graph_to.add_node('B1', ['C1'])
+        graph_to.add_node('B2', ['C2'])
+        result = GraphDelta()
+        result.cut_from = set(['C1', 'C2'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testOneSideOfPartitionedGraph(self):
+        # grabbing a delta which grabs one side of a partitioned graph.
+        graph_from = Graph()
+        graph_from.add_node('A1', ['B1'])
+        graph_from.add_node('A2', ['B2'])
+        graph_from.add_node('B1', ['C1'])
+        graph_from.add_node('B2', ['C2'])
+        graph_from.add_node('C1', [])
+        graph_from.add_node('C2', [])
+        graph_to = Graph()
+        graph_to.add_node('A1', ['B1'])
+        graph_to.add_node('B1', ['C1'])
+        result = GraphDelta()
+        result.heads = set(['A1'])
+        # we only need to cut at C1 because the heads list removes the other
+        # partition implicitly.
+        result.cut_from = set(['C1'])
+        self.check_delta(result, graph_from, graph_to)
+
+    def testApplyToGraphWithGhosts(self):
+        # applying a delta when the graph has ghosts should not error.
+        graph_from = Graph()
+        graph_from.add_node('A', ['B'])
+        graph_from.add_ghost('B')
+        graph_to = Graph()
+        graph_to.add_node('A', ['B'])
+        graph_to.add_ghost('B')
+        result = GraphDelta()
+        self.check_delta(result, graph_from, graph_to)
+
+
+class TestNodeSelector(TestCase):
+
+    def check_selection(self, expected_selection, graph, selector):
+        """Check the node selection from graph using selector.
+
+        This will generate a new selection and check its identical to the
+        expected selection.
+        """
+        selection = selector.select(graph)
+        self.assertIsInstance(selection, list)
+        self.assertEqual(expected_selection, selection)
+
+    def test_constructor_bounds(self):
+        """NthNodeSelector cannot be constructed with a limit of 0."""
+        self.assertRaises(ValueError, NthNodeSelector, 0)
+
+    def test_nth_empty(self):
+        # the nth selector on an empty graph -> []
+        graph = Graph()
+        selector = NthNodeSelector()
+        self.check_selection([], graph, NthNodeSelector(1))
+        self.check_selection([], graph, NthNodeSelector(2))
+
+    def test_nth_one_node(self):
+        # the nth selector on {A:[B]} -> [A]
+        graph = Graph()
+        graph.add_node('A', ['B'])
+        self.check_selection(['A'], graph, NthNodeSelector(1))
+        self.check_selection(['A'], graph, NthNodeSelector(2))
+        self.check_selection(['A'], graph, NthNodeSelector(200))
+
+    def test_nth_two_nodes(self):
+        # the nth selector on {A:[B], C:[B]}
+        graph = Graph()
+        graph.add_node('A', ['B'])
+        graph.add_node('C', ['B'])
+        selector = NthNodeSelector()
+        self.check_selection(['C'], graph, NthNodeSelector(1))
+        self.check_selection(['A', 'C'], graph, NthNodeSelector(2))
+        self.check_selection(['A', 'C'], graph, NthNodeSelector(200))
+
+    def test_nth_three_nodes(self):
+        # the nth selector on {A:[B], C:[B], B:[]}
+        graph = Graph()
+        graph.add_node('A', ['B'])
+        graph.add_node('B', [])
+        graph.add_node('C', ['B'])
+        selector = NthNodeSelector()
+        self.check_selection(['B'], graph, NthNodeSelector(1))
+        self.check_selection(['A', 'B'], graph, NthNodeSelector(2))
+        self.check_selection(['A', 'B', 'C'], graph, NthNodeSelector(3))
+        self.check_selection(['A', 'B', 'C'], graph, NthNodeSelector(200))
+
+    def test_nth_four_nodes(self):
+        # the nth selector on {A:[B], C:[B], B:[], E:[A]}
+        graph = Graph()
+        graph.add_node('A', ['B'])
+        graph.add_node('B', [])
+        graph.add_node('C', ['B'])
+        graph.add_node('E', ['A'])
+        selector = NthNodeSelector()
+        self.check_selection(['C'], graph, NthNodeSelector(1))
+        self.check_selection(['B', 'E'], graph, NthNodeSelector(2))
+        self.check_selection(['A', 'B', 'C'], graph, NthNodeSelector(3))
+        self.check_selection(['A', 'B', 'C', 'E'], graph, NthNodeSelector(4))
+        self.check_selection(['A', 'B', 'C', 'E'], graph, NthNodeSelector(200))