Sprint Feedback

Tom Haddon tom.haddon at canonical.com
Tue Jun 7 09:00:26 UTC 2011

On Thu, 2011-06-02 at 10:15 -0700, Clint Byrum wrote:
> Tom, clearly you've got our attention. Thanks so much for all the
> feedback. Comments in-line.
> Excerpts from Tom Haddon's message of Thu Jun 02 05:02:11 -0700 2011:
> > Dear Ensemble Team,
> > 
> > == Things we like about Puppet ==
> > 
> > - Declarative state. This makes it easier to manage services over the
> > longer term, because you can be assured that systems are configured the
> > way you've told them to be configured.
> I'm interested in people taking a shot at writing a formula with Puppet
> for this very reason. It may simplify some services in that one won't
> need to keep track of what has been done, since Puppet is already good
> at that.

Do you mean writing a formula with Puppet, or writing a Puppet formula?
I may be misunderstanding the terminology here, but the former sounds
like using Puppet to write formulas rather than writing a formula that
deploys Puppet.

> > 
> > - Clean syntax and very simple to deploy services.
> > - Powerful concepts that hold the promise of allowing easy scalability
> > of services.
> > 
> > == Things we don't like about Ensemble ==
> > 
> > - Ensemble seems to currently require a cloud infrastructure (EC2/S3
> > specifically) to run. Are there plans in the future to allow Ensemble to
> > run on bare metal? Our usage of EC2 has been limited for a number of
> > reasons, including cost and performance. If the plan was to only ever
> > have Ensemble work on EC2, that'd make it hard to adopt it for our
> > services.
> I'm curious if you discussed the possibility of OpenStack based deployment.
> It seems a bit daft to run OpenStack on a box for it to only provide one
> "machine" in the form of an LXC container. 
> However, decoupling "the machine" from "the service" is actually one of
> my favorite concepts of Ensemble, though it is quite different from the
> traditional paradigm.
> Anyway, running against OpenStack should work now (though I believe its
> untested), and would allow running Ensemble managed services without
> EC2 or virtualization. Of course, LXC is broken in lucid at the moment,
> so it also pretty much means using Natty. Doh.
> > - Can't preview changes before they happen to determine if they will do
> > what you want them to do. Can't test out new versions of different
> > formulas with different "environments".
> $ ensemble deploy
> usage: ensemble deploy [-h] --repository REPOSITORY
>                        [--environment ENVIRONMENT]
>                        formula [service_name]
> Each environment is specified in ~/.ensemble/environments.yaml, and can
> set things to control the machine provider that includes what hostname
> to contact via the EC2/AWS API (allowing segregation by "cloud"). One 
> can also segregate further by changing the AWS secret key information.
> > 
> > == Some other comments based on the example formulas ==
> > 
> > - The "utility instance" seems to be a single point of failure. If this
> > goes down do we lose access to everything?
> > - Once you've hooked items together, it's confusing to me that the
> > "mysql" service is saying it's relation is "db: wordpress" - wordpress
> > isn't a DB, so shouldn't this be saying "app: wordpress" or "db for:
> > wordpress"?
> I actually think this makes perfect sense. The example really shouldn't
> call the service "wordpress", it should call it "myblog". Then db: myblog
> makes a lot more sense. Its *providing* the db for myblog.
> > - When you add-unit to the wordpress instance, I don't see how this
> > actually provides any scalability. Presumably you'd need to be using
> > round robin DNS, or have a load balancer in front of all these
> > instances, or something like that?
> Check out the mediawiki demo we have in principia for how to integrate
> with haproxy.
> Setup your AWS credentials and install ensemble the same way you did
> for the examples. Then:
> bzr branch lp:principia-tools
> cd principia-tools
> scripts/getall
> tests/mediawiki.sh
> This will spawn quite a few nodes.. 1 mysql db, 2 memcached, 2 mediawiki,
> and 1 haproxy, plus the bootstrap node. Once they're all running the
> haproxy node's public IP *should* present you with a mediawiki instance.
> You can push its scalability by doing 'ensemble add-unit demo-wiki'. If
> you start to push the query capability of the master db server, you can
> add a slave with
> ensemble deploy --repository=formulas mysql slave-db
> ensemble add-relation slave-db:slave wiki-db:master
> There's a bit of a disconnect here, as you have to
> wait for this relation to be fully up before you can relate it to
> mediawiki. I'm still working out if there's a way to do that without
> manually waiting.
> ensemble add-relation demo-wiki:slave slave-db:db
> You can also deploy a munin node to monitor with
> ensemble deploy --repository=formulas munin munin-wiki
> Then
> ensemble add-relation wiki-db munin-wiki
> ensemble add-relation slave-db munin-wiki
> ensemble add-relation wiki-balancer munin-wiki
> ensemble add-relation demo-wiki munin-wiki
> ensemble add-relation wiki-cache munin-wiki
> All of them should eventually show up at the munin machine's public
> ip at /munin. Note that there's a bug in the txzookeeper library that
> seems to affect this munin formula when load gets high (t1.micro isn't
> actually powerful enough to run munin for all these nodes).
> You can see where this is tedious, and Kapil's previously mentioned "policy"
> concept is sorely needed, as ideally you'd just be able to set a policy to
> just install munin-node on all machines and relate them to the munin machine.
> This is something that is easy in config management, because they are built
> to model *machines*, but hasn't been correctly modeled in ensemble yet. It
> should actually be trivial once we figure out how it should work.

Ok, interesting, thx.

> > - Can you use your own AMI? Different instance sizes?
> > - How do you apply security updates to running instances, etc.?
> I think this is something that the agents will eventually handle (I hope),
> something like mcollective's agents, where you can ask a particular class
> of machines to run the "apply all updates" agent.
> For now ssh is the only way to do this. Its pretty easy to get a list of
> machines from 'ensemble status', which is basically just yaml.
> > - Shouldn't the formulas include author info in the yaml? I'd be loathe
> > to create my own formulas based on those someone else has provided
> > unless I know who I can go to if I have problems with the formula. Also,
> > is there any promise of version compatibility, or is it possible that if
> > you create formulas that import other formulas that your own formula
> > will no longer work?
> Formulas are going to be quite tied to revision control. Right now you
> can tell who the authors of the principia formulas are by running 'bzr log'
> on their branches. I do see that having a responsible party listed in the
> YAML would be helpful though.
> > - Can it use elastic IPs (DNS and for interacting with "static"
> > services)? Can it interact with services that are not part of Ensemble
> > (i.e. DB servers that are in a DC rather than in EC2, or servers that
> > you don't want to run with Ensemble for some other reason)?
> We did put on the road map the concept of a "virtual service" which is
> outside of ensemble and just exposes the config details for sending
> externally. At the cost of an EC2 machine, you can of course write a
> formula which simply relates to the service you're interested in, and
> then communicates those details to the external systems.
> > - What security is there in terms of if one server in an ensemble
> > cluster is compromised? How much information is shared between the
> > instances with zookeeper and what's to prevent one server from querying
> > all information on other servers?
> > - What is the Ensemble approach to firewalls? Is it expected that this
> > is a formula issue?
> I did at one point add firewalling to the memcached formula, since
> memcached wasn't configured for SASL so it would be exposed to all
> machines in the same amazon security group. But I took it out as it made
> the formula more complex. I do think eventually it will be in ensemble,
> as will all the ip sharing that I've had to manually implement in formulas
> by parsing ifconfig.
> > - It's not entirely clear to me if you could use Ensemble to replace our
> > current deployment scripts - they are used to push out incremental code
> > updates to specific services, and work by copying code into a directory
> > that includes a unique identifying string (usually the bzr revision of
> > the code in question), bringing the service down, checking it's down,
> > switching the symlink for the code directory we're expecting to find the
> > active code in to the directory we've previously pushed to, and then
> > restarting services, and then checking the service is up. This can be
> > done in parallel or serial, or a combination of both (groups of servers
> > serially, each group in parallel). We can also add in custom hooks to do
> > things like "set read-only mode" for a given service fairly trivially.
> This would fit nicely in the upgrade-formula hook. We talked at one time
> about having arguments for upgrade-formula which were like '--rolling' or
> '--parallel' to control whether nodes were all done at once or in serial.
> I'm not sure how its done now though.
> > 
> > == What's next? ==
> > 
> > Our plans from here are to continue testing Ensemble so that we can try
> > to realistically get an idea of what works for us and what doesn't over
> > the long term. Initially this involves testing how it deals with a bunch
> > of error states, but then we'd also like to begin writing some formulas
> > (I guess participating in https://launchpad.net/principia would be the
> > best thing here).
> Yes please!
> > 
> > I think the overall takeaway as far as we can see is that Ensemble seems
> > suited for deploying services, but not necessarily managing services. Is
> > the idea that you would need to deploy your own management layer through
> > Ensemble, or outside of Ensemble, or is the idea that in the future
> > Ensemble will be able to manage services for you?
> As Gustavo said, its whole purpose is managing services. Getting deployment
> right means that implementing long term management *should* be simple.
> Also I want to call attention to how small all of the hook scripts are.
> The most complicated ones are < 100 lines of python, and even then a
> lot of that is inline templates. I wrote a few pieces in PHP to show how
> simple it can be (and to make it simpler to integrate with a PHP web app).
> I know that Puppet modules aren't towers of code complexity. But learning
> Puppet's DSL to the degree where you can be comfortable with exported
> configs and resources, or learning enough Ruby to do these types of
> deployments in Chef, usually means getting out of your comfort zone for
> a while. Ensemble is trying to address this friction by saying "we will
> run this command, at this point, and respond to these commands in this
> way". Any language or methodology is appropriate in this model, which
> should make it really easy to share and enhance formulas around their
> respective services in a way that encourages collaboration.

More information about the Ensemble mailing list