[MERGE] Review feedback on hash trie inventories, and describe radix tree inventories, plus some details on hash trie implementation.

Robert Collins robertc at robertcollins.net
Tue Sep 2 06:04:21 BST 2008


On Thu, 2008-08-28 at 16:03 +0200, Vincent Ladeuil wrote:
> >>>>> "robert" == Robert Collins <robertc at robertcollins.net> writes:
> 
> <snip/>
> 
>     robert> Canonical form isn't delta based at all;
> 
> May be I mispoked then, what I meant is that the full
> representation is stored as fragments on disk somehow. Accessing
> all the fragments may have a cost. This cost can be reduced by
> putting all the fragments at the same place, eventually
> *duplicating* the same information. As long as the representation
> is canonical this should not be a problem.

This is what the CHK index is about- by storing fragments under their
CHK, we only store one copy for a given fragment. We *could* inside the
storage layer, do text deltas between similar fragments (and probably
will need to) - but there needs to be a very hard, and very clear line
between the layers. The inventory layer must-not-store-deltas, and most
not depend-on-deltas-for-efficient-operations.

>     robert> I think the penny hasn't quite dropped all the way
>     robert> into the slot :). Also note that we're not using a
>     robert> hash partition, there are no delta representations at
>     robert> all. Its history independent (which is one of the
>     robert> goals) - anything that ever writes deltas isn't
>     robert> history independent.
> 
> Hmm. Just to be clear, in the example you gave:
> 
>     robert> Ok, so - we start at the root node, and deserialise
>     robert> it. This gives us a list of child node CHK's, as
>     robert> pointers in a tree. e.g.
> 
>     robert> '0' : HASH1
>     robert> '1' : HASH2
>     robert> '2' : HASH3
> 
>     robert> We then read the other root node, which might give us:
>     robert> '0' : HASH1
>     robert> '1' : HASH4
>     robert> '2' : HASH5
> 
> Is the tree associated with HASH0 stored once and shared between
> the two trees or stored for each tree ?

There isn't a HASH0 in my example. There is a subtree named
'0' (contains keys with the prefix '0'), with a pointer of HASH1.

Its logically stored once for each tree. If they are in the same
repository there will be a single physical copy of it.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080902/849a07ae/attachment.pgp 


More information about the bazaar mailing list