Plans for Loggerhead

Martin Pool mbp at canonical.com
Wed Jan 7 02:34:58 GMT 2009


On  6 Jan 2009, Martin Albisetti <argentina at gmail.com> wrote:
> Hi,
> 
> I've been giving a lot of thought lately on where to go with
> Loggerhead, so I thought I'd share with everyone, and gather some
> thoughts.
> In the past 6 months, we've probably re-written 70% of the code, which
> has been a pretty big improvement in various places.
> The remaining 30% is plumbings and other bits, which we've (I) have
> carefully avoided re-writing. Now that we can finally start putting
> focus on new features like ajax, tarball exports or serving branches
> through Loggerhead, the way some things are engineered makes it really
> hard and painful.
> I've also come across quite a few use cases where people wanted to
> integrate some web-viewing functionalities into other applications,
> and the best answer we have for that is "use the xmloutput plugin, and
> parse", or, re-implement what we currently have in LH.
> And, finally, it's been very hard to scale. For general usage, it's
> fine, but once we push it into crazy amounts of load like in
> Launchpad, it starts to choke a little more than we'd want.
> Some of these issues may be orthogonal to changing Loggerhead, but I
> think we can address most of it by changing the way it works
> internally.
> So, my proposal is to develop a loggerheadlib, something completely
> new, fully tested, which, at it's higher level returns json objects.
> On top of that, we can have a simple plain html interface which is
> rendered by parsing these jsons internally for the
> javascript-impaired, and a nice ajax-based interface with all the
> coolness and server-side processing savings that it gives us.

I do think having a plain interface is important, not only for people
who have non-js browsers (which may be a small percentage, but not
zero), but also so that it can be handled by simple screen scraper tools
and seen by search engines.  Getting google searches to find Loggerhead
pages is useful for the people doing the searching and builds a network
effect on bzr.

> I haven't thought out all the details, but this is more or less what
> I've come up with (some of them by bouncing ideas off of mwhudson):
> 
> - Generate one json file per revision, which contains all the
> information about it, excluding the diffs. Doing this will let us
> generate it once, and serve it statically from then on. It also
> potentially lets us cache it client-side, so we don't have to fetch
> information we already did
> - Generate a json file per file per revision
> - Generate jsons for inventory

Would it be better having a layer at which these are e.g. python
dictionaries of strings, and then one layer up they are translated into
json syntax?  That might be more congenial for Python code that wants to
use them, and you could feed them into a template library.

> - Possibly do the same for annotate. Just cache metadata per line and
> extract the full text from bzrlib. Maybe cache the fulltext as well, I
> don't know how expensive it is ATM


> - Cache the full revision graph in sqlite, so we can access it
> partially without having to store all of it in memory. Ideally, we'd
> end up modifying bzrlib to allow partial access, so I'd like to
> structure this in a way that's easy to drop the sqlite backend later
> on

I'd rather look to either change the underlying bzr code to perform as
you need, or do the cache within bzrlib as it will be useful elsewhere.
It may also eventually be faster to do this cache using facilities like
indexes that we already have available.

It would be also good to say pass up last-modified data all the way
through the stack from bzr to http.  Then we can cache the output; we
can revalidate without regenerating the whole thing; and as you mention
we can put squid in front of it.

> - Clearly separate the bits of code that talk to bzrlib, so it's
> easier to identify faster/more efficient ways of accessing
> information, and keep it up to date with the latest and greatest
> recommended goodness
> 
> My plan is to focus the next release of Loggerhead on making the
> serve-branches script use a config file (using bzr's config methods
> and format, possibly using locations.conf), so we can completely
> replace/remove start/stop-loggerhead, which has been un-loved since
> Micheal brought on the serve-branches goodness.
> Once that's out of the way, I plan to dive into restructuring the
> internals, and slowly move from what we have to the new code.
> 
> I am honestly open to other suggestions as to what approach to take,
> so any radical (yet implementable) ideas are more than welcome, as
> changes to the json approach.

I know you now have this large and exciting global design
responsibility, and so somewhat less time for loggerhead.  Maybe we
should be looking for someone else who could spend more work time on it.

-- 
Martin      <http://launchpad.net/~mbp>



More information about the bazaar mailing list