[MERGE] Review feedback on hash trie inventories, and describe radix tree inventories, plus some details on hash trie implementation.

Wed Aug 27 09:27:59 BST 2008

>>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:

    robert> Thanks Vila, the reward for a review, is another review.

Perfect, the last night brought more concerns on the subject,
I'll try to articulate them as best as I can.

Disclaimer:

I have ideas and experiences (at least 2 significant
ones) in implementing hash-trees (for lack of a better name) as a
way to quickly an efficiently obtain differences between
trees. 

Yet, I still lack knowledge about bzr internals to say:
"this is how we should implement it". So I appreciate any
explanation while I catch up on that internals knowledge.

There is currently a gap between what I understand from your
description and my mental model on how hash-trees work.

I think one way to fill that gap is to review. That may mean that
I should vote 'comment' instead of 'approve', but since we are
talking about design here this may be counter-productive for both
of us.

End of disclaimer.

    robert> I've not down an incremental delta - a full reread
    robert> may catch other quirks, and more review is good for
    robert> this sort of thing..

No problem with that, I read bazaar-commits at lists.ubuntu.com for
incremental diffs :)

<snip/>

    robert> --- doc/developers/inventory.txt	2008-01-03 01:40:16 +0000
    robert> +++ doc/developers/inventory.txt	2008-08-27 00:42:39 +0000

<snip/>
    robert>  In memory inventories
    robert>  =====================
    robert> @@ -39,9 +41,411 @@
    robert>  All the xml serialized forms write to and read from a single byte string, whose
    robert>  hash is then the inventory validator for the commit object.

    robert> -journalled
    robert> -----------
    robert> -
    robert> -The in development journalled inventory serializer generates a single byte
    robert> -string during serialization, but may require many byte strings to deserialize,
    robert> -and these are discovered recursively.
    robert> +
    robert> +Serialization scaling and future designs
    robert> +========================================
    robert> +
    robert> +Overall efficiency and scaling is constrained by the bottom level structure
    robert> +that an inventory is stored as. We have a number of goals we want to achieve:
    robert> +
    robert> + 1. Allow commit to write less than the full tree's data in to the repository
    robert> +    in the general case. 

I'm not sure we can respect that AND get an efficient (speed
wise) implementation of tree differences.

IME some representation of the full tree is needed when it comes
to calculate the differences. There may be solutions to work with
partial representation (or build/load them on demand), but I can't
imagine them without filling the gap I mentioned earlier.

Said otherwise: I'm sure a huge performance gain can be achieved,
do we really need to address the space gain too ? On the other
hand working with less data is often a good way to achieve
performance gains...

    robert> + 2. Allow the data that is written to be calculated without examining every
    robert> +    versioned path in the tree.
    robert> + 3. Generate the exact same representation for a given inventory regardless of
    robert> +    the amount of history available.

No problems with these two.

Well, as long as the canonical representation is adequate :)

    robert> + 4. Allow in memory deltas to be generated directly from the serialised form
    robert> +    without upcasting to a full in-memory representation or examining every
    robert> +    path in the tree. Ideally the work performed will be proportional to the
    robert> +    amount of changes between the trees being compared.

Ok for the 'without examining every path in the tree' and 'the
work performed will be proportional to the amount of changes
between the trees being compared'.

Some doubts about the 'without upcasting to a full in-memory
representation'. But I can imagine on-demand loading of the full
representation (which also means no loading for the common parts).

    robert> + 5. Allow fetch to determine the file texts that need to be pulled to ensure
    robert> +    that the entire tree can be reconstructed without having to probe every
    robert> +    path in the tree.
    robert> + 6. Allow bzr map paths

Is it 'bzr to map' ?

    robert>      to file ids without reading the entire serialised
    robert> +    form. This is something that is used by commands such as merge PATH and
    robert> +    diff -r X PATH.
    robert> + 7. Let bzr map file ids to paths without reading the entire serialised form.
    robert> +    This is used by commands that are presenting output to the user such as
    robert> +    loggerhed, bzr-search, log FILENAME.

Lost in the gap.

    robert> + 8. We want a strong validator for inventories which is cheap to generate.
    robert> +    Specifically we should be able to create the generator for a new commit
    robert> +    without processing all the data of the basis commit.
    robert> + 9. Testaments generation is currently size(tree), we would like to create a
    robert> +    new testament standard which requires less work so that signed commits
    robert> +    are not significantly slower than regular commits.

Half-lost in the gap but doesn't sound too hard to get.

    robert> +
    robert> +
    robert> +We have current performance and memory bugs in log -v, merge, commit, diff -r,
    robert> +loggerhead and status -r which can be addresessed by an inventory system

s/addresessed/addressed/ or s/addresessed/assessed/

    robert> +meeting these goals.
    robert> +

<snip/>
    robert> +Hash bucket based inventories
    robert> +=============================
    robert> +
    robert> +Overview
    robert> +--------
    robert> +
    robert> +We store two maps - fileid:inventory_entry and path:fileid, in a stable
    robert> +hash trie, stored in densly packed fragments.. We pack keys into the map

s/.././

<snip/>
    robert> +Canonical form
    robert> +--------------
    robert> +
    robert> +There are three fragment types for the canonical form:
    robert> +
    robert> +root_node: (Perhaps this should be inlined into the revision object).
    robert> +HASH_INVENTORY_SIGNATURE
    robert> +path_map: CHK to root of path to id map

You don't explain what CHK means. Is is just CHecK as in a hash
addressing the path_map ?

    robert> +content_map: CHK to root of id to entry map
    robert> +
    robert> +map_node: INTERNAL_NODE or LEAF_NODE

You later define leaf_node not LEAF_NODE. I'm used to lowercase
identifiers for on-terminals and uppercase identifiers to
terminals.

Also, indenting the continuation lines in the right hand size
(RHS) of the rules will make them easier to read.

    robert> +INTERNAL_NODE:
    robert> +INTERNAL_NODE_SIGNATURE
    robert> +hash_prefix: PREFIX
    robert> +prefix_width: INT
    robert> +PREFIX CHK TYPE SIZE
    robert> +PREFIX CHK TYPE SIZE ...

Err, I'm lost here. hash_prefix and prefix_width are not
referenced and hash_prefix is defined twice.

    robert> +
    robert> +(Where TYPE is I for internal or L for leaf).
    robert> +
    robert> +leaf_node:
    robert> +LEAF_NODE_SIGNATURE
    robert> +hash_prefix: PREFIX
    robert> +HASH\x00KEY\x00 VALUE

\x00 is an implementation detail, can we get rid of it for now ?

<snip/>
    robert> +Delta
    robert> +-----
    robert> +
    robert> +To generate a delta between two inventories, we first generate a list of
    robert> +altered fileids, and then recursively look up their parents to generate their
    robert> +old and new file paths.
    robert> +
    robert> +To generate the list of altered file ids, we do an entry by entry comparison of
    robert> +the full contents of every leaf node that the two inventories do not have in
    robert> +common. To do this, we start at the root node, and follow every CHK pointer
    robert> +that is only in one tree. We can then bring in all the value from the leaf
    robert> +nodes and do a set difference to get the altered ones, which we would then
    robert> +parse.

This sounds a lot like what I'm after but I don't yet get the
needed understanding to validate it.

    robert> +
    robert> +Radix tree based inventories
    robert> +============================
    robert> +

<snip/>

As summarized below, I lack the knowledge to validate the
differences between hash tries and radix trees, nothing sounds
wrong while reading though.

    robert> +Hash Trie details
    robert> +=================
    robert> +
    robert> +The canonical form for a hash trie is a tree of internal nodes leading down to
    robert> +leaf nodes, with no node exceeding some threshold size, and every node
    robert> +containing as much content as it can, but no leaf node containing less than
    robert> +its lower size threshold. (In the event that an imbalance in the hash function
    robert> +causes a tree where an internal node is needed, but any prefix generates a
    robert> +child with less than the lower threshold, the smallest prefix should be taken).
    robert> +An internal node holds some number of key prefixs, all with the same bit-width.

s/prefixs/prefixes/ ?

<snip/>
    robert> +Insertion
    robert> +---------
    robert> +
    robert> +1. Hash the entry, and insert the entry in the leaf node with a matching
    robert> +   prefix, creating that node and linking it from the internal node containing
    robert> +   that prefix if there is no appropriate leaf node.
    robert> +1. Starting at the highest node altered, for all altered nodes, check if it has
    robert> +   transitioned across either size boundary - 0 < min_size < max_size. If it
    robert> +   has not, proceed to update the CHK pointers.
    robert> +1. If it increased above min_size, check the node above to see if it can be
    robert> +   more densely packed. To be below the min_size the node's parent must
    robert> +   have hit the max size constraint and been forced to split even though this
    robert> +   child did not have enough content to support a min_size node - so the prefix
    robert> +   chosen in the parent may be been shorter than desirable and we may now be able

s/be been/be/

    robert> +   to more densely pack the parent by splitting the child nodes more. So if the
    robert> +   parent node can support a deeper prefix without hitting max_size, and the
    robert> +   count of under min_size nodes cannot be reduced, the parent should be given
    robert> +   a deeper prefix.
    robert> +1. If it increased above max_size, shrink the prefix width used to split out
    robert> +   new nodes until the the node is below max_size (unless the prefix

s/the the/the/

    robert> +   width is already 1 - the minimum).
    robert> +   To shrink the prefix of an internal node, create new internal nodes for each
    robert> +   new prefix, and populate them with the content of the nodes which were
    robert> +   formerly linked. (This will normally bubble down due to keeping densely
    robert> +   packed nodes).
    robert> +   To shrink the prefix of a leaf node, create an internal node with the same
    robert> +   prefix, then choose a width for the internal node such that the contents 
    robert> +   of the leaf all fit into new leaves obeying the min_size and max_size rules.
    robert> +   The largest prefix possible should be chosen, to obey the
    robert> +   higher-nodes-are-denser rule. That rule also gives room in leaf nodes for 
    robert> +   growth without affecting the parent node packing.
    robert> +1. Update the CHK pointers - serialise every altered node to generate a CHK,
    robert> +   and update the CHK placeholder in the nodes parent; then reserialise the
    robert> +   parent. CHK pointer propogation can be done lazily when many updates are
    robert> +   expected.
    robert> +

Hmmm. Mixed feelings as a reviewer here. On the one hand you
provide a lot of details to guide the implementation which is
good. On the other, I unable to validate such details without at
least a prototype implementation (I confess I have never
implemented such a scheme even if most of the concepts ring
bells).

    robert> +Multiple versions of nodes for the same PREFIX and internal prefix width should
    robert> +compress well for the same tree.

Conclusion
----------

At that point, I'm not able to make a informed judgement and I
don't know if it's because I lack knowledge about our current
implementation or because I can't predict if your proposed
implementation will do better (obviously or is inclusive here).

I vote BB:tweak again since some points need to be made clearer
but the whole explanation is better.

    Vincent