[RFC] New generic graph storage format

Fri Sep 7 15:49:50 BST 2007

Oh, and a further thought I had a while back but forgot again :(...

Another possible replacement data structure for the TKI is a perfect
hash table. This would allow, once the preamble is read, 2 integer reads
per node read, with validation occuring by reading the node and seeing
if it is the key we wanted,

On disk this would look something like
PREAMBLE (defines f1 and f2 and N) (note that N is where to find the
record data for our usage, so N is the number of bytes of index data
after the hash table + whatever fuzz we want to allow for hash
generation efficiency).
nlog(n,2) or nlog(n, 10) bytes of flattened graph values (N entries,
each of which has to represent a value from 0-N.)
node-data...

and node-data will be
DELIM
key
DELIM
...

accessing a node is:
take the key
calculate f1(key) and f2(key)
readv the values array for those two values, sum them and take that mod
N to get the node location.
Read a reasonable amount of data starting at the end of the hash table +
the node location.

We could shrink things further by using a sparse index and specifying
node pages.

querying for missing entries will result in the read reading a node that
does not start with a delim and the key, or if we address by page by
reading a page without the key.

http://www.amk.ca/files/python/perfect_hash.txt is a sample perfect hash
generator for python.

quick size upper bound: we know 7x10e5 entries can fit in 10e9 bytes
today using dictionary encoding for references within the same file -
and we don't need to stop doing that in this sort of scheme.
so if we take N as 10e9 and work in binary we need 27bits per value. For
simplicity as this is an upper bound - use 32bits - 4 bytes. There are
at most 1.7 10e6 values (2 per key) to record, so most of the values
array is going to be 0 (as the value array would be 4x10e9 bytes long).
So thats not a good fit.

But anyhow, is it viable? If we shrunk N by using pages of (say) 4K,
we'd be looking at 10e6 bytes of value array, which is much more
acceptable (at the trade off of having to search within a page for a
node). Alternatively we could look at sparse array encoding schemes, but
my intuition says page based addressing is going to be simpler and more
understandable as well as tending to grab the final data for many
related nodes at a time, if the nodes are group topologically - a common
theme in this design dialoge.

-Rob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070908/b7112545/attachment.pgp