Loggerhead directions
John Arbash Meinel
john at arbash-meinel.com
Fri Apr 16 20:09:25 BST 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
>> The main problem is that we instantiate a new Branch instance *per
>> request*. Which means that any caching I assume to happen on or under
>> Branch won't persist between HTTP requests.
>>
>> So far, I've just gone via the bzrlib apis you added recently
>> (dotted_revno_to_revision_id, iter_merge_sorted_revisions, etc.) I
>> haven't quite worked out if they are enough. But the *big* issue is that
>> you have 0 caching between requests.
>
> In the historycache plugin, my solution was to serialise the cache to
> disk. The next bzr command that required it would load it - it didn't
> need to merge-sort the graph and assign revnos again.
That is also what bzr-history-db does. My point is that I was trying to
get rid of Loggerhead's own cache. If we do so, performance will *tank*
if there isn't some other cache underneath. If we even just cached the
Branch objects, then *they* have their own 'merge_sort' cache.
>
>> loggerhead is in a bit of a pickle, trying to stay stateless and yet
>> handle cache state... I don't have a great answer here.
>
> It sounds like the sort of thing memcached was designed to solve. Is it
> worth considering?
I'm considering having a disk structure outside of standard bzr data a
cache. bzr-history-db is one, historycache would be one, etc. memcached
is a different deal entirely.
...
>
> One nice thing about the revision-id-to-revno map is that the data
> *ought* to be highly stable. It should only change when new revisions
> are added and, provided the mainline is only appended to, the old data
> should remain a correct subset.
>
> I wonder if we can use those facts to our advantage when setting up
> codebrowse caching for Launchpad? For example, I suspect 90% of feature
> branches are simply a few revisions over and above a revision of the
> mainline of a series branch, i.e. there are no additional dotted-revnos
> for those branches after their creation point. And if there were, users
> would rarely visit pages displaying them? And if they did, we could
> still calculate those each time, rather than cache data that may never
> be needed?
Again, this is exactly how bzr-history-db is designed. It has
"_IncrementalMergeSort" as a class to generate the new merge sort graph
+ dotted-revnos based on new revisions.
As for 'no new entries', I would doubt it. When the feature branch
merges trunk, you'll get *lots* of new revnos.
You're right that we could decide whether or not to write that data to
disk. I don't know off-hand how much data it would be. If you go back to
my numbers of:
8.8MB bzr.dev only
50MB all 4.7k branches in bzr.dev
= 9.2kB per branch
For Launchpad this seems absolutely not worth worrying about.
Note that for MySQL the numbers are actually *better*. Just 'trunk' was
~20MB, all 17k branches was 50MB => 1.8kB per Branch.
>
> In other words, I'm wondering whether the right caching deployment might
> be something like:
>
> 1. If a branch has less than X (1k?) revisions, don't bother with a cache.
> 2. If the branch is the only one for a project or the branch is
> assigned a series (like trunk or 2.1), then cache revnos for
> revison-ids in that branch.
>
> Not sure. If any case, I think we should:
>
> 1. Assume that caches for all branches will be overkill.
Perhaps, but not expensive to maintain.
>
> 2. Look at ways of using the cache of a parent branch for
> stacked branches.
>
> Ian C.
>
Again, 1-cache per project would manage this just fine.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkvIteUACgkQJdeBCYSNAAND8ACgv/MdoGQeAxr77OyjPdkHFWuf
rewAn3H/gd2Ri48tlCq3DzAFjGARZYUK
=TZB7
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list