[RFC] Allow KnitVersionedFile to track a cached text

Fri Jan 19 22:19:01 GMT 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The attached patch changes KnitVersionedFile so that it has the
possibility to cache the last text inserted. The idea is that when doing
a bulk conversion you can set this flag and it will cache the last
inserted text in memory, rather than having to extract it by applying a
bunch of patches.

This has the largest effect for inventory.knit since that file would
receive a new entry every time, and at present we have to *extract* the
contents by reading the knit each time (even if we keep a lock open).

This only caches when writing a new entry, though theoretically you
could cache at other times (like the last read text, etc).

You could modify this to use an LRU cache or something like that, so you
wouldn't have unlimited caching going on.

The way I've set it up, you can tell a specific KnitVersionedFile to
cache the last added text, or you can set the default value to caching.
This has a measurable effect on conversion speed. Again with the same
800 revision project with ~150 files:

72s bzr.dev
73s new code with caching turned off
70s cache only inventory.knit
65s cache the last text for all files

There are a couple points to consider with this test.

1) Because the project is fairly small, we don't really abuse not having
a cache.  Because we generate a fulltext dynamically based on the size
of the deltas, in a small project we probably don't have to apply many
deltas. Contrast this with something like the Mozilla project, where I
would expect 90% of the time we do the full 200 deltas between full-texts.

2) I'm a little concerned about memory consumption on a big project like
Moz if we cache all texts for all knits. There is certainly the ability
to improve performance, since doing so prevents us from applying a lot
of 'line-deltas' during conversion. But this is certainly a place where
I would consider adding an LRU cache, shared across knits. Then we could
say "keep the last 100-1000 texts" in memory.

3) This only keeps 1 text for each KnitVersionedFile. Which means that
if you have a lot of branches with concurrent development, and cvsps
claims that you need to jump back and forth you start reaching
diminishing benefits, because it only helps if the previous insertion
was the parent.
An 'optimal' solution would be to cache 1 inventory text per branch,
which is the current tip of that branch.
I thought about doing that in cvsps, the problem is getting the
information from all the way at the top when doing "tree.commit()" all
the way down to where KnitVersionedFile.add_lines() needs it.
We already have the ability to pass in a "parent_texts()" dictionary to
Knits. The problem is that it is happening at a much lower level than we
need it to be happening.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFsUPVJdeBCYSNAAMRAkWsAKDWFQ3OUbbbfn+6yxNqYn6vKWj+2ACdHZRo
bBXnUjTyxGZoFJ9dhgGKh1Y=
=slTf
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cache_inv_text.patch
Type: text/x-patch
Size: 2872 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20070119/7578d456/attachment.bin