pack index performance

Wed Jun 11 02:58:05 BST 2008

Adding to the theories for why index performance is improved with less
indices I think it is the linear access pattern.

Specifically, say we have N indices: [0, 1, ... N-1]

If we look up a key k we do the following IO:
bisect_in(indices[0])
...
bisect_in(indice_with_k)

Looking up K keys we do the same IO pattern, reducing K by found keys at
each step until K is empty.

If we assume K == N and each k in K is present in one index only, that
means we look for:
K keys in indices[0]
K-1 keys in indices[1]
...
1 key in indices[K-1]

looking for more keys in a single index corresponds to larger readv's
and more processing in the index layer.

If we altered our search, to instead look in all indices in parallel, we
would have less locality of reference, but likely perform all IO in
parallel. That is to find a key k:
while k not found:
  bisect_step(indices[0])
  ...
  bisect_step(indices[N-1])

This would trim the bisection required in all indices as soon as any k
in K is found.

(I can be more rigourous about this, but this is a quick note to prevent
the idea being lost). I'll note that I had this in mind at the start of
the index layer's work, its just slightly hidden from view at the
moment.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080611/e81c7b55/attachment.pgp