index paging and caching

Robert Collins robertc at robertcollins.net
Thu Jul 26 04:22:53 BST 2007


I plan to test the performance again with john's 'difference_update
sucks' patch. Early next week I hope to put in place a caching layer for
my index's. This will operate on a page basis, with 4K pages on
LocalTransports and 64/128K pages on remote transports.

There are some open questions in my mind:
 - how big should the page cache be
 - when should it be discarded
 - when should it be invalidated

These indices are write-once, so I don't think it needs invalidation or
dirty marking at all.

I'm inclined to discard the cache when the repository becomes unlocked,
though if we were not worried about memory we could keep it until the
repository object is discarded.

In terms of size, we have several different indices:
signatures (texts, no deltas, no graph)
inventories (texts, deltas, no graph)
file texts (texts, deltas, graph)
revisions (texts, no deltas, graph)

we will see a massively larger file text index than the rest combined -
consider mozilla with 15K file ids, thats up to a 15K:1 ratio to the
inventory index.

So I'm inclined to have separate caches for each type of index, but
share the cache amongst the different component indicies of each type,
and finally size it by a heuristic based on the number of different
index files being used. e.g. if there are 100 index files, then allow
100 pages to be used.

Thoughts?

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070726/b0eafdcd/attachment.pgp 


More information about the bazaar mailing list