Loggerhead directions
John Arbash Meinel
john at arbash-meinel.com
Wed Apr 14 21:10:05 BST 2010
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
...
>
> Its really not clear to me what value for loggerhead, which is pretty
> close to barebones wsgi, adopting a framework brings for the goals of
> making it fast and packaged? As for speed the ideal scenario are
> figuring out how to make the bzr web browsing experience web speedy for
> large repositories and effectively O(1) wrt to history.
>
...
> Examining the views one by one and figuring out what they need access
> to, and to what extent they need to utilize the cache is probably a good
> first step in this direction. John Meinel's work on the branch is a
> great step forward in that it updates the cache generation to modern bzr
> apis and minimizes the time to generate the whole history cache. But
> even loading cache is relatively expensive, and it seems like its fairly
> common for views to calculate data from it without using the full
> result. John mentioned he had some ideas on a new cache structure that
> might be more efficient which would be great to see.
Now that I've got 'bzr-history-db' mostly complete, I figured I'd give
some attention to loggerhead. Focusing mostly on the History class.
I've gotten to the point where I've nuked loggerhead's disk-cache, and I
can get an emacs page load in around 400ms.
There will still be an issue for the 'very first run' that we have to
sort out. Some numbers:
13355ms loggerhead trunk on emacs w/ no caches (fresh start, no disk
cache)
679ms with in-memory cache
1144ms loggerhead trunk fresh-start w/ on disk cache
672ms in-memory
440ms proposed loggerhead w/ no memory and only bzr-history-db cache
30161ms Time to populate a fresh bzr-history-db cache of emacs trunk
188ms Time to populate a second emacs branch w/ 256 revs
So the bad. The very first-ever run would be slower than not populating
the cache. The cache file is also larger. (revinfo.sql is 5.9MB,
bzr-history-db.sql is 31MB for emacs.)
The better, the cache can be shared between all branches. So switching
from emacs/trunk to emacs/feature1 to emacs/feature2 will be avoid the
'fresh-start w/ on disk cache' performance, and even better always
avoids the 'fresh start no-disk cache' performance. So once you've done
the 30s import. Your worst-case time becomes closer to 700ms, (440+188)
rather than 13000ms when a new feature branch is browsed.
Other stuff...
I've currently gutted the loggerhead caches, and switched the code to
just use bzrlib apis, which are a lot faster when bzr-history-db is
enabled. I thought it would be ok, because Branch caches a lot of stuff
like 'revision_id_to_revno_map'. However, it turns out that we create a
new Branch object for *every* request. And while we could put Branch's
in an LRUCache, they aren't particularly thread safe.
*sigh*.
So now I'm trying to figure out if I should go back to at least an
in-memory cache, and populate it from the data I query via bzrlib apis,
or whether I should just mandate that bzr-history-db is available.
If I do the latter, then I may as well change the Loggerhead code to
query the database directly, and only use stuff like
Branch.last_revision()...
Some minor guidance would be appreciated.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkvGIR0ACgkQJdeBCYSNAAOs0gCeN6mTeNi+nJWOwYxgDUaLjPJ/
IB8AniPEQRDY9qOKyzIdeS8CmvOlBs7H
=Bzij
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list