Zookeeper to MongoDB transition

Tue Jun 12 23:15:30 UTC 2012

Excerpts from Aram Hăvărneanu's message of 2012-06-12 10:36:13 -0700:
> Hello team,
> 
> One of our goals is to replace Zookeeper. We want to use MongoDB to
> store charms, so the idea of using MongoDB for the things we'd normally
> use Zookeeper falls naturally. We would not introduce an alternative
> dependency and storing all state inside a single software container might
> provide some potential useful tighter coupling between pieces of data.

This comes as a *shock* to me, and I think is the first time anyone has
spoken of getting rid of Zookeeper publicly. I had heard rumblings and
rumors, but never any actual "Our goal is to replace Zookeeper". I guess
the "Our" was a group that didn't talk much in public forums.

Was there any public discussion about the reasons to get rid of Zookeeper,
and/or the reasons to choose MongoDB?

> 
> The initial idea was to implement a drop-in package that implemented
> the Zookeeper API on top of MongoDB, switch everything to it, and after
> things have stabilized start trimming unneeded features and Zookeeper
> idiosyncrasies, which are a lot. We have not abandoned this plan, but
> we are exploring alternatives.
> 

Count me as a -1 for this plan. You go from having a complex but well
known API (ZK) used to manage a complex data set, to having a skeleton
implementation of said API (pretend-ZK) to manage a complex database
(MongoDB) to manage a complex data set.

The whole point would be to *REMOVE* complexity would it not? Oh wait, I
think you're about to make that point. :)

> The problem with the above approach is that we introduce a new, relatively
> complex layer for dubious benefit. Yes, we remove a dependency, but our
> higher goal is to reduce overall complexity. First of all, we don't even
> use Zookeeper that much:
> 
>   white:juju$ find . -name '*.go' | grep -v test |
>   xargs -n1 9 grep '((zk)|(zookeeper)).*\(' | grep -v func |
>   9 sed -e 's/^.*=//g' -e 's/zkConn/zk/g' -e 's/\.conn\./.zk./g'
>   -e 's/zookeeper/zk/g' | 9 grep '[^a-zA-Z]zk\.[A-Z]' |
>   9 sed -e 's/^.*zk\.([A-Z].*)\(.*$/\1/g' -e 's/\(.*$//g' |
>   sort | uniq -c | sort -nr
>        18 IsError
>        10 Create
>         6 Get
>         4 RetryChange
>         4 Delete
>         2 WorldACL
>         2 ExistsW
>         2 Exists
>         2 Dial
>         1 GetW
>         1 Close
>         1 ChildrenW
>         1 Children
> 
> The internals of the state package are modeled after Zookeeper, for
> example the whole topology node business. This is costing us some
> complexity. If we do things slightly differently, for example if state
> uses MongoDB directly, we might reduce that complexity and we would not
> need another complex layer doing translation. Zookeeper forces us to
> manage the topology manually, but if we chose the data types right, we
> could make MongoDB do all this bookkeeping for free. No need to maintain
> the topology when MongoDB could construct it on demand by simple queries.
> 

Anything done "on demand" can't possibly be "free", but I like the way
you're going with that. I always assumed ZK was capable of doing this
until I heard about the big single topology node.

As long as we're adandoning ZK, does it make sense to let go of having
a central store that has *EVERYTHING* in it?

I'm quite concerned about how centralized things are. Mongo will just
reinforce that.

I've often wondered why we don't let nodes own their relation data, and
just have them communicate back and forth using 0mq rather than through an
intermediate central db. Then only presence has to be centralized. This
has the nice benefit of not having a single datastore with every single
password that is needed to compromise every single service in your
environment.

> I have studied the state package today and I believe a transition to
> the MongoDB API would make it significantly less complex. The functions
> exported by the package map well to MongoDB operations, whereas now the
> high level exported functions dive into internal stuff that does redundant
> namespace manipulation that would need to be undone by mgokeeper.
> 
> I have not yet come with MongoDB schemas for the state data, mainly
> because it's foreign code to me. I need to code some prototype code and
> see how it feels before I can come with something sensible, but I am
> open to suggestions on how to organize the types.
> 

Can this be simplified into a generic Juju REST API? MongoDB seems like
as much an implementation detail as ZK. Giving so much power to all
of the components of the systems means that Mongo will just get more
and more locked in as juju grows. Basically I'm asking whether or not
juju should implement a service oriented architecture rather than just
throwing a bunch of direct MongoDB clients out there and then being
stuck with only being able to innovate where MongoDB lets us innovate.