RFC: caching in CHKInventory

Robert Collins robertc at robertcollins.net
Fri Oct 3 00:42:07 BST 2008


On Thu, 2008-10-02 at 18:06 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I'm not 100% sure I understand the implications, but I'll try to give
> some feedback.

Thanks ;)

> Robert Collins wrote:
> > I'm building up a CHKInventory class, which is a demand-loadable
> > inventory representing a serialised inventory only. (So its not used, or
> > intended to be used during e.g. commit).
> 
> So this is an in-memory structure, similar to Inventory? So that
> RevisionTree.inventory = CHKInventory(some-filesystem-info)?
> 
> I guess I'm trying to understand what you will be caching. Are you
> caching inventory pages? In serialized or unserialized form? What key
> are you caching them under? Are you thinking of caching *between*
> CHKInventory objects, or are you caching within an instance?

I'm considering caching InventoryEntry objects - dirs, files, symlinks,
tree references.

> I guess we cache in an in-memory dictionary as you "add" items to BTree,
> and then we flush them out to disk in sorted order. (With a bit extra to
> spill early nodes to disk when we get memory constrained, and then
> "merge sort" the indexes back together at finish() time.)

> Or are you thinking about the "read" phase, where we do cache the
> deserialized pages.

Both.

> Is CHKMap the best name for what you are thinking of? (mapping paths =>
> file-ids.) It is just being hard for me to directly understand what the
> class does. I suppose it is similar to a BTreeIndex only where the page
> keys are CHK's?

The CHKMap class is a map from string to string, thats all. so its a
dict where all keys and values are strings. CHKDict would be another
possible name, but its not as mutable as a regular dict, and I didn't
want users to think it was closer than it is.

> Note that I think the final key is going to be a path or a file id,
> which *isn't* a CHK. Which is probably my confusion. CHK seems like an
> intermediate and internal-only detail.

as the map requires a CHK capable versioned files instance to store and
retrieve data, its not at all hidden from users.

> > The apply-delta operation at the inventory level will generate a new
> > CHKInventory rather than mutating the existing one; which is why I am
> > pondering caching at all at this level: accessing individual entries is
> > done by a lookup into a CHKMap, and then a parse operation.

> So I guess the question is... should we be caching the internal pages of
> the CHKInventory (from Root on down to the leaves which have the actual
> inventory contents).

The pages will be cached by the CHKMap, I'm convinced that that is
needed. Its the individual entries I'm concerned about.

-Rob
-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20081003/260aeeda/attachment.pgp 


More information about the bazaar mailing list