Plans for Loggerhead

Wed Jan 7 01:58:28 GMT 2009

Hi,

I've been giving a lot of thought lately on where to go with
Loggerhead, so I thought I'd share with everyone, and gather some
thoughts.
In the past 6 months, we've probably re-written 70% of the code, which
has been a pretty big improvement in various places.
The remaining 30% is plumbings and other bits, which we've (I) have
carefully avoided re-writing. Now that we can finally start putting
focus on new features like ajax, tarball exports or serving branches
through Loggerhead, the way some things are engineered makes it really
hard and painful.
I've also come across quite a few use cases where people wanted to
integrate some web-viewing functionalities into other applications,
and the best answer we have for that is "use the xmloutput plugin, and
parse", or, re-implement what we currently have in LH.
And, finally, it's been very hard to scale. For general usage, it's
fine, but once we push it into crazy amounts of load like in
Launchpad, it starts to choke a little more than we'd want.
Some of these issues may be orthogonal to changing Loggerhead, but I
think we can address most of it by changing the way it works
internally.
So, my proposal is to develop a loggerheadlib, something completely
new, fully tested, which, at it's higher level returns json objects.
On top of that, we can have a simple plain html interface which is
rendered by parsing these jsons internally for the
javascript-impaired, and a nice ajax-based interface with all the
coolness and server-side processing savings that it gives us.
I haven't thought out all the details, but this is more or less what
I've come up with (some of them by bouncing ideas off of mwhudson):

- Generate one json file per revision, which contains all the
information about it, excluding the diffs. Doing this will let us
generate it once, and serve it statically from then on. It also
potentially lets us cache it client-side, so we don't have to fetch
information we already did
- Generate a json file per file per revision
- Generate jsons for inventory
- Possibly do the same for annotate. Just cache metadata per line and
extract the full text from bzrlib. Maybe cache the fulltext as well, I
don't know how expensive it is ATM
- Cache the full revision graph in sqlite, so we can access it
partially without having to store all of it in memory. Ideally, we'd
end up modifying bzrlib to allow partial access, so I'd like to
structure this in a way that's easy to drop the sqlite backend later
on
- Clearly separate the bits of code that talk to bzrlib, so it's
easier to identify faster/more efficient ways of accessing
information, and keep it up to date with the latest and greatest
recommended goodness

My plan is to focus the next release of Loggerhead on making the
serve-branches script use a config file (using bzr's config methods
and format, possibly using locations.conf), so we can completely
replace/remove start/stop-loggerhead, which has been un-loved since
Micheal brought on the serve-branches goodness.
Once that's out of the way, I plan to dive into restructuring the
internals, and slowly move from what we have to the new code.

I am honestly open to other suggestions as to what approach to take,
so any radical (yet implementable) ideas are more than welcome, as
changes to the json approach.

-- 
Martin