[MERGE] Review feedback on hash trie inventories, and describe radix tree inventories, plus some details on hash trie implementation.

Wed Sep 24 07:55:02 BST 2008

>>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:

<snip/>

    robert> I'll see about working them in.

I think that's where we left :)

Below is my proposal. I've mostly pasted your comments. The place
I did it may not be the most relevant. Feel free to take or throw
what you found appropriate. 

You can then land the result.

    >> P.S.: If it wasn't clear enough, I'd be more than happy to work
    >> on that subject in any way :D

That's still true :)

       Vincent

=== modified file 'doc/developers/inventory.txt'

--- doc/developers/inventory.txt	2008-08-27 00:42:39 +0000
+++ doc/developers/inventory.txt	2008-09-24 06:47:21 +0000
@@ -230,7 +230,9 @@
 Canonical form
 --------------
 
-There are three fragment types for the canonical form:
+There are three fragment types for the canonical form, (CHK meaning Content
+Hash Key, a generic term for using a hash (not specified) to generate the key
+for content):
 
 root_node: (Perhaps this should be inlined into the revision object).
 HASH_INVENTORY_SIGNATURE
@@ -263,6 +265,19 @@
 entry and inserting them into both the path map and the content map. The maps
 start with just a single leaf node with an empty prefix. 
 
+The full tree representation in a fragmented form is found by reading all the
+fragments, following the root and out from there.  Compared to our existing
+plan-and-read its more round trips, but only as many as the height of the tree
+(the number of index operations also increased, but they parallelise as we can
+query them for all nodes at a single level in parallel).
+
+Writing less that the full inventory data is achievable by reusing nodes from
+an earlier version - which means having the key for those nodes be something
+we can reuse.
+
+This representation allows us to avoid reconstructing a 'full' in-memory
+representation by loading only the parts that are different between the
+considered tree shapes.
 
 
 Apply
@@ -294,6 +309,59 @@
 nodes and do a set difference to get the altered ones, which we would then
 parse.
 
+We rely on the following CHK properties:
+- two different subtrees MUST have two different CHKs,
+- the probability for a CHK clash is so low we neglect it.
+
+The following example illustrates the way it works:
+
+we start at the root node, and deserialise it. This gives us a list of child
+node CHK's, as pointers in a tree. e.g.
+
+'0' : HASH1
+'1' : HASH2
+'2' : HASH3
+
+We then read the other root node, which might give us:
+'0' : HASH1
+'1' : HASH4
+'2' : HASH5
+
+We compare these nodes, and see that the pointer to 0 is equivalent, so
+we eliminate that subtree.
+
+Then on both sides we read the nodes for 1 and 2, using the CHK's HASH2,
+HASH3, HASH4 and HASH5 to read them.
+
+Rinse and repeat till we have leaf nodes not internal nodes everywhere.  Do a
+final readv of the leaf nodes, and when we deserialise those we have lists of
+inventory entries, one list per node, and each node is in one of the two
+trees; we can set difference the two sides to eliminate common entries - the
+remainder is the difference.
+
+We then just need to map the fileids to paths, to get an in memory inventory
+delta such as apply_inventory_delta wants.
+
+This design found its inspiration in 'Ideal Hash Trees, Phil Bagwell'.  The
+described data structure there can be considered an ancestor for this proposal
+with some key differences:
+ - we need to guarantee one and only one representation for any content 
+   of the tree
+ - we have write once storage and arbitary value sizes, so the disk
+   layout stuff in that paper is largely irrelevant - we always have 
+   to write up the tree, so we have a slightly different case to
+   optimise for.
+
+The canonical form we are searching for may need some adjustments regarding
+the Hash Partition (HP) described in the paper: intuitively, to cap the amount
+of data to examine on rebalance we need hierarchical splitting not linear
+splitting. And a single HP clearly gives us best direct access to a given
+bucket, but at the cost of rewriting the HP on every commit: so we will be
+doing IO proportional to the size of it on every operation. Smaller HP is
+better then, and the easiest way to do that is to permit a tree rather than a
+single HP.
+
+
 Radix tree based inventories
 ============================