Rev 2389: Sketch out finding missing revisions in client server mode. in file:///home/robertc/source/baz/netsim/

Robert Collins robertc at robertcollins.net
Sun Apr 1 03:16:26 BST 2007


At file:///home/robertc/source/baz/netsim/

------------------------------------------------------------
revno: 2389
revision-id: robertc at robertcollins.net-20070401021622-c63kkszulpt30wae
parent: robertc at robertcollins.net-20070331011121-inbkb9pi7lmgbfhs
committer: Robert Collins <robertc at robertcollins.net>
branch nick: netsim
timestamp: Sun 2007-04-01 12:16:22 +1000
message:
  Sketch out finding missing revisions in client server mode.
added:
  doc/design/                    design-20070401020531-rqa7d77i22idnj78-1
  doc/design/server.txt          server.txt-20070401020531-rqa7d77i22idnj78-2
=== added directory 'doc/design'
=== added file 'doc/design/server.txt'
--- a/doc/design/server.txt	1970-01-01 00:00:00 +0000
+++ b/doc/design/server.txt	2007-04-01 02:16:22 +0000
@@ -0,0 +1,128 @@
+====================
+Bazaar Server Design
+====================
+
+VFS Layer
+=========
+
+High-performance server
+=======================
+
+Determining missing data between client and server.
+
+Basic idea:
+Perform a recursive graph (DAG) difference algorithm incrementally.
+
+Some tools are needed. The first is 'graph_delta_description'. This generates a
+minimal description of a graph relative to a supergraph. Sketching this out: we
+could use a number of different tools to describe this: nodes to start at; a
+count of paths to avoid (if paths are numbered repeatable), nodes to search
+from, nodes to finish at; nodes that should not be included; edges to skip, and
+more. If the description will be larger than the size of the node names in the
+graph, an error should be raised.
+The functional requirements of the delta description are:
+ * It can be applied by the generator to the supergraph to recreate the graph,
+   even if the supergraph has new nodes added; once added a node and the paths
+   from it are immutable.
+ * The description can be serialised.
+The initial implementation will simply list the nodes to start from, and nodes
+that should not be traversed to. This can represent any arbitrary subgraph of
+the supergraph and does not require complex cutting logic.
+
+The second tool is 'node_sample_selection'. This takes a graph and returns a
+selection of nodes from within it along with a short token. This selection must
+have the following properties:
+ * It is repeatable given the token and the graph.
+ * The token must be short(< the size of a node) and serialisable.
+To prevent repeating loops it must also:
+ * include all the terminal nodes of each disjoint subgraph.
+
+
+Constraints:
+ * The client is permitted state.
+ * The server is not permitted to hold state about a particular client across requests.
+
+Client side:
+To generate a list of the server nodes which the client does not have, the
+client asks the server for a graph_delta_description and a
+node_sample_selection. In asking for this the client provides the most recent
+graph_delta_description, node_sample_selection token, a bitmap of presence
+flags in the node_sample_selection the server supplied. The first request
+clearly has nothing to supply. The client then loops around on this until the
+server supplies a 'completed' response with a graph_delta_description and no
+node_sample_selection: at this point the server has decided that the
+computation is complete.
+
+Server side:
+When a client asks for a graph_delta_description and node_sample_selection with
+no previous description or token & bitmap, the server should create a
+current_graph which equals the supergraph, and return that and a
+node_sample_selection.
+When the client asks with a previous graph_delta_description and
+node_sample_selection & bitmap, then the server infers as much as it can about
+the client data base from the presence bitmap. In doing this it colours its
+supergraph with several labels (each node can have only one label). Present -
+the client has the node. Missing - the client does not have the node. Unknown -
+the client may have the node. If the set of Unknown nodes is empty then the
+server sends a completed message along with a graph_delta_description for the
+Missing nodes. If the set of Unknown nodes is non-empty then the server sends a
+graph_delta_description for the Unknown subgraph(s) along with a
+node_sample_selection from that. The inference proceeds by:
+ 1. Mark the entire graph unknown.
+ 2. For every node reachable from the subgraph specified by the graph delta,
+    mark it Present (it was Present in the last iteration).
+ 2. For every node derived from the subgraph specified by the graph delta, mark
+    it Missing. (it was missing in the last iteration).
+ 3. For every present node the client supplied walk from it to its parents and
+    mark the transitive closure as Present.
+ 4. For every node the client did not mark as present, find its descendants and
+    mark them as Missing. An assertion may be made at this point that they were
+    not already marked 'Present' (this would indicate ghosts on the client).
+    Alternatively we may ignore a missing statement in the ancestry of a
+    present statement, or try to infer the size of the ghosted region.
+
+
+Worked example:
+
+Server Graph: {A:[]}
+Client Graph: {}
+
+Client message 1:
+get_missing()
+response:
+delta=(HEADS=[A], STOPS=[])
+sample=(0, [A])
+Client message 2:
+get_missing((HEADS=[A], STOPS=[]), (0, [False]))
+response:
+'complete', delta=(HEADS=[A], STOPS=[])                                                                   
+
+Server Graph: {A:[], B:[A], C:[A], D:[B, C]}
+Client Graph: {A:[], B:[A], E:[B]}
+
+Client message 1:
+get_missing
+response:
+delta=(HEADS=[E], STOPS=[])
+sample=(0, [A, D])
+client message 2:
+get_missing((HEADS=[E], STOPS=[]), (0, [True, False]))
+response:
+delta=(HEADS=[B,C], STOPS=[A])
+sample=(0, [B, C])
+client message 3:
+get_missing((HEADS=[B,C], STOPS=[A]), (0, [True, False]))
+response:
+'complete', delta=(HEADS=[C], STOPS=[C])
+
+
+
+Discussion:
+ - could the client infer the cut points from the returned bitmap ? Might save a
+   little bw but will be problematic on the second roundtrip when the sent
+   nodes does not match the entire new cutlist.
+ - The algorithm can work symmetrically, so the client can provide its own
+   node_sample_selection in each roundtrip, halving the roundtrips required to
+   determine the missing revisions.
+ - Extremely well balanced graphs may lead to very wide tails lists for the
+   node_sample_selection and graph_delta_description routines.



More information about the bazaar-commits mailing list