Zookeeper to MongoDB transition

Tue Jun 12 19:40:32 UTC 2012

On Tue, Jun 12, 2012 at 2:36 PM, Aram Hăvărneanu
<aram.havarneanu at canonical.com> wrote:
> The internals of the state package are modeled after Zookeeper, for
> example the whole topology node business. This is costing us some
> complexity. If we do things slightly differently, for example if state
> uses MongoDB directly, we might reduce that complexity and we would not
> need another complex layer doing translation. Zookeeper forces us to

Agreed, and to be clear, it was always the intention to benefit from
the MongoDB properties and query language more widely, rather than
just using it as a zk layer. The plan of having a zk-layer-on-mongo
was to make the transition simpler. If doing the whole step at once is
looking simpler in practice, we attempt a plan to have that happening.

> manage the topology manually, but if we chose the data types right, we
> could make MongoDB do all this bookkeeping for free. No need to maintain
> the topology when MongoDB could construct it on demand by simple queries.

That sounds great. I suggest not even having a topology in the first
place. That double-layer in the state package is an artifact of
mapping a filesystem-based model onto an object model, which is less
of an issue with MongoDB.

> I have studied the state package today and I believe a transition to
> the MongoDB API would make it significantly less complex. The functions
> exported by the package map well to MongoDB operations, whereas now the
> high level exported functions dive into internal stuff that does redundant
> namespace manipulation that would need to be undone by mgokeeper.
>
> I have not yet come with MongoDB schemas for the state data, mainly
> because it's foreign code to me. I need to code some prototype code and
> see how it feels before I can come with something sensible, but I am
> open to suggestions on how to organize the types.

I suggest the following: let's add a new experimental mstate package
in the project, and start porting code over from the state package
onto it. The documentation, the public API, and the tests for the
public API should remain unchanged between both packages. We have
internal tests (for topology, mainly) that will be dropped, though.

This allows the experiment to move forward in an integrated way
without disturbing the main code path, and in a way that everybody has
visibility and can comment on the process and progress.

How does that sound to everybody?

gustavo @ http://niemeyer.net