indices and composite keys?

Robert Collins robertc at robertcollins.net
Sat Jul 14 05:25:02 BST 2007


I'm wondering what is a more tasteful approach here.

The scenario: I want to combine the index data from all the file texts
into a single index. It will be a GraphIndex (because I don't want to
change knit access at this point). To discriminate between 'revision X,
fileid Y' and 'revision X, fileid Z' it seems to me we are really using
a multi column index.

This could done by munging the key: 'revision/fileid'. 
Alternatively, it could be done by having a clear multi-column key:
(revision, fileid).

Whats better? I'm leaning strongly towards a clear multi-column key
because it seems to offer simpler opportunities for optimisation.

For instance, if I have 2K file ids, 10K files texts I'm going to have
roughly 5 revision ids per file id, and thats duplication. In a regular
DB one might factor out the two separate fields. So here I can imagine
doing the following in the index:

key-0-values (bisectable perhaps?):
value
value
value

key-1-values (bisectable perhaps?):
value
value
value

entries:
keyreference keyreference NODEREFS VALUE

This would give one and only one copy of every revisionid in the index;
and one and only one copy of every fileid in the index. We could even
use this to influence the sort order if we chose.

Doing this with this index-level awareness that there are multiple keys
would involve either heuristics :( or more complex dictionary
compression schemes where partial references to keys can be made.

Thoughts?
-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070714/83379566/attachment-0001.pgp 


More information about the bazaar mailing list