Evaluating Juju for use in a large system
Torin Sandall
torinsandall at gmail.com
Sat Sep 1 22:06:29 UTC 2012
Clint,
Thanks for the prompt reply.
One other question I have is, does Juju provide a way to set service
unit specific configuration when the units are being added? I see two
ways this could work:
1) you have some global configuration which has areas for specific
service units. Of course these areas would need to be keyed by the
service unit name, so there would need to be a mechanism for
specifying the service unit name when you deploy or add-unit.
2) you have the ability to set service unit specific configuration
when you deploy or add-unit.
I don't think either of these approaches are supported in Juju today,
but maybe I'm missing something and there's another way to handle it.
The use case which brought this to mind was deploying Cassandra with
Juju, e.g., you want to manually assign the token a given Cassandra
service instance bootstraps with.
-Torin
On Sat, Sep 1, 2012 at 7:13 AM, Clint Byrum <clint at ubuntu.com> wrote:
> Excerpts from Torin Sandall's message of 2012-08-31 19:48:57 -0700:
>> Hello,
>>
>> I'm working on a project which requires robust and simple-to-use
>> service deployment, configuration, and coordination functionality.
>> This appears to be an area where Juju will excel (and already does to
>> some extent.)
>>
>
> First,welcome!
>
> We think Juju will excel in such systems as well. :)
>
>> I am trying to decide whether I should recommend that the project move
>> forward and be developed on top of Juju. Note, this decision has to be
>> made quite soon, so I really need to be able to gauge the state of
>> Juju at this point in time. If I can find answers to the following
>> questions then it will make my decision process much easier.
>>
>> 1) The project I'm working on is going to be largely Python based, so
>> the fact that Juju is also written in Python is a big win. Could
>> someone elaborate on the rationale for switching to Go? I have seen
>> the presentation from Gustavo at Google I/O where he briefly mentions
>> error handling and pitfalls of Twisted's callback model, however, some
>> more details on the subject would be appreciated.
>>
>
> The reasons Gustavo gave are the ones for the language choice. I'd like
> to think that users of Juju won't need to hack on it very much... but
> if you do, Go is a pretty straight forward language I think.
>
>> 2) Following on the last question, when is the Go version going to be
>> out of development and considered ready for use and/or production?
>> Will there be a production release of the Python version?
>>
>
> You can follow the development of the Go version here:
>
> https://launchpad.net/juju-core
>
> Version 2.0 is due in early October and should achieve feature parity
> with the python version (on ec2 only though).
>
> The python version is receiving maintenance and important bug fixes.
> Its used in production in some places, though I always counsel people
> to check these two lists and have workarounds for these issues in mind
> when deciding to deploy it:
>
> https://bugs.launchpad.net/juju/+bugs?field.tag=security
> https://bugs.launchpad.net/juju/+bugs?field.tag=production
>
> Many of these bugs are being fixed, though I can't make any promises
> that they will *all* be fixed.
>
>> When I refer to "service instance" below I mean the actual service
>> (E.g., wordpress, mysql) running inside the service unit.
>>
>> When testing Juju I came across a couple behaviours I didn't
>> understand and would like to know if there is a way around them or if
>> there's a plan to fix them. If there's a plan to fix them then an
>> approximate timeline would be much appreciated.
>>
>> 3) I performed some tests with deploying a RabbitMQ service with
>> multiple units (using the charm from `bzr branch
>> lp:charms/rabbitmq-server`.) One of the tests I executed involved
>> running deploy followed by two `juju add-unit` commands back to back.
>> The first two units spawned fine however the last received an empty
>> value for the Erlang cookie value and as such the RabbitMQ service
>> instance was unable to start. I'm wondering if there are any open
>> issues around race conditions with deploy/add-unit.
>>
>
> There are issues with peer relations. They are fine for building a list of
> the members of the cluster, but its impossible to predict the order and
> there is no "leader election" support so the way the cookie generation
> happens is actually not really reliable when adding units in parallel.
>
> It might work better if you waited for the first unit before adding more
> units. There's a command in the 'juju-jitsu' package (only in the latest
> quantal or the JUJU PPA) called 'watch' that will help with this kind
> of logic. Note that juju-jitsu lags development and is very experimental.
>
> Either way, we're just now adding explicit test support to the official
> charm store. It sounds like a 3-node cluster would be a good test to
> run and get passing.
>
>> >>>Now that I look closer at the rabbitmq-server charm, I'm surprised this was even possible since it checks to see if the cookie value is empty.
>>
>> This was the output of `juju status` after the failure happened (I
>> tried running `juju resolved` and `juju resolved --retry` on it
>> without any luck.):
>>
>> rabbitmq/2:
>> agent-state: started
>> machine: 0
>> public-address: 192.168.122.25
>> relation-errors:
>> cluster:
>> - rabbitmq
>>
>> 4) Is there a way to ensure that service instances will be able to
>> perform their clustering operations with their peers before the
>> dependant relations are notified about their presence? If not, is this
>> something which is planned to be supported? I did stumble across this
>> post (https://lists.ubuntu.com/archives/juju/2012-February/001258.html)
>> which seems to touch on the subject, but I'm not sure what the outcome
>> was. I ran into this when I was trying to test deploying and scaling a
>> RabbitMQ cluster. I found that when I had another application which
>> depended on RabbitMQ, the other application would be notified as soon
>> as new RabbitMQ service unit was added, even before RabbitMQ had a
>> chance to cluster.
>>
>
> This is definitely more complicated than it needs to be right now. One
> method we could use is to have the cluster relation loop through the
> amqp relations that have been established and do soething like this:
>
> relation-set ... clustered=1
>
> Allowing apps to require clustering. But its hard to get this right in
> a generic way... need to think more about this one.
>
>> There were some other things which I came across during testing which
>> I thought would be useful features. I'm wondering if any of these are
>> on the roadmap. If they are, will they be included in the Python code
>> base?
>>
>
> The bug list is long, and there are a lot of features to consider for
> the future. However, its unlikely features will be implemented in the
> python version as a priority.
>
>> 5) It would be nice if there was a graceful shutdown mechanism so that
>> the service instance could be notified and tidy up before the unit is
>> destroyed.
>>
>
> https://bugs.launchpad.net/juju/+bug/872264
> https://bugs.launchpad.net/juju/+bug/932269
>
>> 6) Many services can benefit from having a "locked" state whereby they
>> will allow pending request processing to finish but not accept any
>> more requests. Is there any plan to expose this sort of mechanism to
>> the services? Note, it would mainly be used to "lock" and "unlock"
>> individual service instances so I imagine the command would need to be
>> targeted at service units.
>>
>
> This could be implemented today in service configs. For unit-specific
> things I usually recommend just using 'juju ssh'. In theory, the 'stop'
> hook should handle this need when juju needs to stop the service, with
> the start hook reversing its effects.
>
>> 7) When it comes to destructive operations like remove-unit,
>> destroy-service, etc. it might be nice if there was an acknowledgment
>> system which would allow service instances to nack if the operation
>> was considered invalid/dangerous, e.g., removing units such that there
>> would no longer be enough to satisfy a replication factor. Of course,
>> there would still be a --force option available. Is this type of
>> feature on the roadmap?
>>
>
> Right now, only 'terminate-machine' and 'destroy-environment'
> will actually remove machines from your environment, and thus remove
> data. remove-unit and destroy-service simply remove their definition from
> juju, and cause other sides to have their broken/departed hooks called.
> So, the consuming charms of a service need to make sure they don't cancel
> any important operations when they get departed/broken, but that should
> be sufficient.
>
> This bug is kind of about simplification of that:
>
> https://bugs.launchpad.net/juju/+bug/862422
>
> Its marked as "Medium".. so there are no plans for the immediate future.
>
>> Again, Juju looks like a great project, and I hope I can use it as a
>> key building block in a large scale system. If I can find answers to
>> these questions it would be an excellent first step!
>>
>
> Its great to have your feedback Torin, please let us know if there is
> anything else we can do to make your decision easier. :)
>
> --
> Juju mailing list
> Juju at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
More information about the Juju
mailing list