Loggerhead directions

Wed Apr 14 21:10:05 BST 2010

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> 
> Its really not clear to me what value for loggerhead, which is pretty
> close to barebones wsgi, adopting a framework brings for the goals of
> making it fast and packaged? As for speed the ideal scenario are
> figuring out how to make the bzr web browsing experience web speedy for
> large repositories and effectively O(1) wrt to history.
> 

...

> Examining the views one by one and figuring out what they need access
> to, and to what extent they need to utilize the cache is probably a good
> first step in this direction. John Meinel's work on the branch is a
> great step forward in that it updates the cache generation to modern bzr
> apis and minimizes the time to generate the whole history cache. But
> even loading cache is relatively expensive, and it seems like its fairly
> common for views to calculate data from it without using the full
> result. John mentioned he had some ideas on a new cache structure that
> might be more efficient which would be great to see.

Now that I've got 'bzr-history-db' mostly complete, I figured I'd give
some attention to loggerhead. Focusing mostly on the History class.

I've gotten to the point where I've nuked loggerhead's disk-cache, and I
can get an emacs page load in around 400ms.

There will still be an issue for the 'very first run' that we have to
sort out. Some numbers:

13355ms	 loggerhead trunk on emacs w/ no caches (fresh start, no disk
         cache)
  679ms  		  with in-memory cache
 1144ms	 loggerhead trunk fresh-start w/ on disk cache
  672ms	 		  in-memory

  440ms  proposed loggerhead w/ no memory and only bzr-history-db cache

30161ms  Time to populate a fresh bzr-history-db cache of emacs trunk
  188ms  Time to populate a second emacs branch w/ 256 revs

So the bad. The very first-ever run would be slower than not populating
the cache. The cache file is also larger. (revinfo.sql is 5.9MB,
bzr-history-db.sql is 31MB for emacs.)

The better, the cache can be shared between all branches. So switching
from emacs/trunk to emacs/feature1 to emacs/feature2 will be avoid the
'fresh-start w/ on disk cache' performance, and even better always
avoids the 'fresh start no-disk cache' performance. So once you've done
the 30s import. Your worst-case time becomes closer to 700ms, (440+188)
rather than 13000ms when a new feature branch is browsed.

Other stuff...

I've currently gutted the loggerhead caches, and switched the code to
just use bzrlib apis, which are a lot faster when bzr-history-db is
enabled. I thought it would be ok, because Branch caches a lot of stuff
like 'revision_id_to_revno_map'. However, it turns out that we create a
new Branch object for *every* request. And while we could put Branch's
in an LRUCache, they aren't particularly thread safe.

*sigh*.

So now I'm trying to figure out if I should go back to at least an
in-memory cache, and populate it from the data I query via bzrlib apis,
or whether I should just mandate that bzr-history-db is available.
If I do the latter, then I may as well change the Loggerhead code to
query the database directly, and only use stuff like
Branch.last_revision()...

Some minor guidance would be appreciated.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvGIR0ACgkQJdeBCYSNAAOs0gCeN6mTeNi+nJWOwYxgDUaLjPJ/
IB8AniPEQRDY9qOKyzIdeS8CmvOlBs7H
=Bzij
-----END PGP SIGNATURE-----