Loggerhead directions

Fri Apr 16 20:09:25 BST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> The main problem is that we instantiate a new Branch instance *per
>> request*. Which means that any caching I assume to happen on or under
>> Branch won't persist between HTTP requests.
>>
>> So far, I've just gone via the bzrlib apis you added recently
>> (dotted_revno_to_revision_id, iter_merge_sorted_revisions, etc.) I
>> haven't quite worked out if they are enough. But the *big* issue is that
>> you have 0 caching between requests.
> 
> In the historycache plugin, my solution was to serialise the cache to
> disk. The next bzr command that required it would load it - it didn't
> need to merge-sort the graph and assign revnos again.

That is also what bzr-history-db does. My point is that I was trying to
get rid of Loggerhead's own cache. If we do so, performance will *tank*
if there isn't some other cache underneath. If we even just cached the
Branch objects, then *they* have their own 'merge_sort' cache.

> 
>> loggerhead is in a bit of a pickle, trying to stay stateless and yet
>> handle cache state... I don't have a great answer here.
> 
> It sounds like the sort of thing memcached was designed to solve. Is it
> worth considering?

I'm considering having a disk structure outside of standard bzr data a
cache. bzr-history-db is one, historycache would be one, etc. memcached
is a different deal entirely.
...

> 
> One nice thing about the revision-id-to-revno map is that the data
> *ought* to be highly stable. It should only change when new revisions
> are added and, provided the mainline is only appended to, the old data
> should remain a correct subset.
> 
> I wonder if we can use those facts to our advantage when setting up
> codebrowse caching for Launchpad? For example, I suspect 90% of feature
> branches are simply a few revisions over and above a revision of the
> mainline of a series branch, i.e. there are no additional dotted-revnos
> for those branches after their creation point. And if there were, users
> would rarely visit pages displaying them? And if they did, we could
> still calculate those each time, rather than cache data that may never
> be needed?

Again, this is exactly how bzr-history-db is designed. It has
"_IncrementalMergeSort" as a class to generate the new merge sort graph
+ dotted-revnos based on new revisions.

As for 'no new entries', I would doubt it. When the feature branch
merges trunk, you'll get *lots* of new revnos.

You're right that we could decide whether or not to write that data to
disk. I don't know off-hand how much data it would be. If you go back to
my numbers of:
  8.8MB bzr.dev only
  50MB all 4.7k branches in bzr.dev
= 9.2kB per branch

For Launchpad this seems absolutely not worth worrying about.

Note that for MySQL the numbers are actually *better*. Just 'trunk' was
~20MB, all 17k branches was 50MB => 1.8kB per Branch.

> 
> In other words, I'm wondering whether the right caching deployment might
> be something like:
> 
> 1. If a branch has less than X (1k?) revisions, don't bother with a cache.
> 2. If the branch is the only one for a project or the branch is
>    assigned a series (like trunk or 2.1), then cache revnos for
>    revison-ids in that branch.
> 
> Not sure. If any case, I think we should:
> 
> 1. Assume that caches for all branches will be overkill.

Perhaps, but not expensive to maintain.

> 
> 2. Look at ways of using the cache of a parent branch for
>    stacked branches.
> 
> Ian C.
> 

Again, 1-cache per project would manage this just fine.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvIteUACgkQJdeBCYSNAAND8ACgv/MdoGQeAxr77OyjPdkHFWuf
rewAn3H/gd2Ri48tlCq3DzAFjGARZYUK
=TZB7
-----END PGP SIGNATURE-----