indices and composite keys?
Robert Collins
robertc at robertcollins.net
Sat Jul 14 05:25:02 BST 2007
I'm wondering what is a more tasteful approach here.
The scenario: I want to combine the index data from all the file texts
into a single index. It will be a GraphIndex (because I don't want to
change knit access at this point). To discriminate between 'revision X,
fileid Y' and 'revision X, fileid Z' it seems to me we are really using
a multi column index.
This could done by munging the key: 'revision/fileid'.
Alternatively, it could be done by having a clear multi-column key:
(revision, fileid).
Whats better? I'm leaning strongly towards a clear multi-column key
because it seems to offer simpler opportunities for optimisation.
For instance, if I have 2K file ids, 10K files texts I'm going to have
roughly 5 revision ids per file id, and thats duplication. In a regular
DB one might factor out the two separate fields. So here I can imagine
doing the following in the index:
key-0-values (bisectable perhaps?):
value
value
value
key-1-values (bisectable perhaps?):
value
value
value
entries:
keyreference keyreference NODEREFS VALUE
This would give one and only one copy of every revisionid in the index;
and one and only one copy of every fileid in the index. We could even
use this to influence the sort order if we chose.
Doing this with this index-level awareness that there are multiple keys
would involve either heuristics :( or more complex dictionary
compression schemes where partial references to keys can be made.
Thoughts?
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070714/83379566/attachment-0001.pgp
More information about the bazaar
mailing list