Sprint Feedback

Kapil Thangavelu kapil.thangavelu at canonical.com
Thu Jun 2 15:07:40 UTC 2011


Excerpts from Tom Haddon's message of Thu Jun 02 08:02:11 -0400 2011:
> Dear Ensemble Team,
> 
> I've just come back from a Sprint for a subset of the sysadmins at
> Canonical who are responsible for deploying and managing some of the
> services that Canonical runs (Landscape, Launchpad, Ubuntu One, etc.)
> during which time we were focused on implementing puppet for as many
> services as we can. However, we're also very interested in Ensemble,
> although we know that it's not quite ready for production usage yet. I
> thought it might be useful to give you an idea of what things we like
> and don't like about Puppet, and what things (from our investigations so
> far) we like and don't like about Ensemble.
> 
> So first of all, a few comments about our general needs to give you some
> context:
> - We need to be able to install and configure new instances of existing
> services so that we can scale services up, or to replace services that
> are running on older hardware.
> - We need to be able to deploy new versions of code to these services on
> a frequent basis in a consistent way.
> - We need to be able to relatively gracefully recover from deployments
> of code that cause regressions.
> - We need to be able to deploy distinct services in a consistent way. In
> other words, deployments of Landscape from our perspective should look
> as similar as possible to deployments of Launchpad.
> - We need to be able to monitor and debug applications that have been
> deployed over the lifetime of the application.
> - We need to be able to easily understand the state of the servers that
> our services run on.
> 
> == Things we like about Puppet ==
> 
> - Declarative state. This makes it easier to manage services over the
> longer term, because you can be assured that systems are configured the
> way you've told them to be configured.
> - No-op mode allows you to test what changes would be applied by a given
> update to puppet.
> - Can run with different environments - this allows you to try things
> out on some servers before applying to all servers.
> 
> == Things we don't like about Puppet ==
> 
> - Hard to do deployments from within Puppet (it's configuration
> management tool, not a deployment tool - currently we plan to keep using
> our own deployment tools).
> - Hard to clean up after itself if we alter the configuration we want.
> 
> == Things we like about Ensemble ==
> 
> - Clean syntax and very simple to deploy services.
> - Powerful concepts that hold the promise of allowing easy scalability
> of services.
> 
> == Things we don't like about Ensemble ==
> 
> - Ensemble seems to currently require a cloud infrastructure (EC2/S3
> specifically) to run. Are there plans in the future to allow Ensemble to
> run on bare metal? Our usage of EC2 has been limited for a number of
> reasons, including cost and performance. If the plan was to only ever
> have Ensemble work on EC2, that'd make it hard to adopt it for our
> services.

This is on the roadmap, but probably till oneieric+1 before we get to bare metal usage.
We can prioritize this if need be. 

> - Doesn't seem to be a way to maintain state across the servers that
> Ensemble is managing.

Could you elaborate what you mean by that.

> - Can't preview changes before they happen to determine if they will do
> what you want them to do. Can't test out new versions of different
> formulas with different "environments".
> 

Right, its not declarative, so dry run options listing what it would do are not
readily possible. Ensemble runs an executable, so visibility into the logic there
is limited. 

It is possible to run multiple versions of the same formula, for different services,
and upgrade the services individually.

> == Some other comments based on the example formulas ==
> 
> - The "utility instance" seems to be a single point of failure. If this
> goes down do we lose access to everything?

The goal is that this 'utility instance' should be scaled like any other
ensemble service, with the distinction changing to its just an internal service.

> - Once you've hooked items together, it's confusing to me that the
> "mysql" service is saying it's relation is "db: wordpress" - wordpress
> isn't a DB, so shouldn't this be saying "app: wordpress" or "db for:
> wordpress"?

Yeah.. this is probably poor naming in the example, the mysql name for the
relation could definitely be called app.

> - When you add-unit to the wordpress instance, I don't see how this
> actually provides any scalability. Presumably you'd need to be using
> round robin DNS, or have a load balancer in front of all these
> instances, or something like that?

If you have a load balancer (a haproxy service related to wordpress), adding
a unit of wordpress will automatically configure the instance into the haproxy
load balancer config. If the wordpress unit disappears for whatever reason 
(network split, machine down, removed explicitly) the load balancer will
again reconfigure to remove the unit from the rotation.

> - Can you use your own AMI? Different instance sizes?

Not at the moment, we're quite a bit closer to this capability now with
the ensemble ppa. A large part of the reason why we have ppas at all was
to decrease the provisoning time, as installing some of the components
on machines like the bootstrap node can take several minutes (ie. java)

> - How do you apply security updates to running instances, etc.?

We've had some discussion of having some sort of 'policy' formulas which
always run at the machine level to allow them to tweak the machine configuration,
setup logging, landscape-client, etc. This would allow propogation of updates
in whatever manner people see fit (auto, landscape). This broader topic
is definitely an area of active discussion.

> - Shouldn't the formulas include author info in the yaml? I'd be loathe
> to create my own formulas based on those someone else has provided
> unless I know who I can go to if I have problems with the formula. Also,
> is there any promise of version compatibility, or is it possible that if
> you create formulas that import other formulas that your own formula
> will no longer work?

The formula author info could definitely be added to the yaml. Its a little
unclear in this context of based on if you mean forked in the puppet/chef sense
or the reused as a dependency.

Version compatibility for formulas isn't enforced by ensemble, its more
of a governance and policy question i think. Ensemble does support the 
notion of formula upgrades propated to a service's units executing a formula
upgrade hook.

> - Can it use elastic IPs (DNS and for interacting with "static"
> services)? Can it interact with services that are not part of Ensemble
> (i.e. DB servers that are in a DC rather than in EC2, or servers that
> you don't want to run with Ensemble for some other reason)?

We don't have this capability at the moment for either permanent addressing
via elastic ip or external service integration managed by ensemble. The 
relations and services managed by ensemble are a closed set. 

> - What security is there in terms of if one server in an ensemble
> cluster is compromised? How much information is shared between the
> instances with zookeeper and what's to prevent one server from querying
> all information on other servers?

I'm working on a security overview document atm. We're relying on zookeeper
ACLs and per connection principals. The compromise of an individual service
unit won't be able to read or write any additional information beyond that
which the service unit requires for its normal operation. Specifically
reading from related service units, its service's config, and writing
to its own relation settings nodes.

> - What is the Ensemble approach to firewalls? Is it expected that this
> is a formula issue?

Ensemble currently allows for open access and does nothing wrt to firewalls, 
there's work ongoing at the moment to restrict external access to ports 
explicitly set by the formula (via some new cli apis available to hooks, ala 
open-port/close-port). This feature is currently being implemented using ec2 
security groups, but the goal is for the enforcement point of this to move to 
the machine agent and use the local machine firewall to increase provider 
portability.

I know clint's done some formulas that directly manipulate the local firewall,
but its unclear atm how well that functionality can be maintained in the future 
from within an lxc container.

> - It's not entirely clear to me if you could use Ensemble to replace our
> current deployment scripts - they are used to push out incremental code
> updates to specific services, and work by copying code into a directory
> that includes a unique identifying string (usually the bzr revision of
> the code in question), bringing the service down, checking it's down,
> switching the symlink for the code directory we're expecting to find the
> active code in to the directory we've previously pushed to, and then
> restarting services, and then checking the service is up. This can be
> done in parallel or serial, or a combination of both (groups of servers
> serially, each group in parallel). We can also add in custom hooks to do
> things like "set read-only mode" for a given service fairly trivially.

The read only mode could be encapsulated into a service config setting, this
is an ongoing work with the final pieces landing early next week. This allows
the user/admin to set values on a service, which are then propogated to its
units resulting in the execution of a config-changed hook execution.

It sounds like the overall functionality of this deploy script could be 
encompassed by the available ensemble hooks like upgrade/install/config-changed
and encapsulated into a formula. Ie on install pull specific code revision 
from service config, deploy into  local directory, if install start the 
service, else if upgrade/config change bring the service down, check its 
down, restart service. 

In terms of addressable parallelization or serialization of this operation,
atm the unit of parallelization is at the service level, ie. an upgrade
or config change will apply to all the service units in parallel. There's
some discussion of more advanced upgrade scenarios (rolling upgrade, parallel
service with data copy) in the upgrade documentation, however only formula upgrades
are implemented atm.

https://ensemble.ubuntu.com/docs/upgrades.html

<snip>

> I think the overall takeaway as far as we can see is that Ensemble seems
> suited for deploying services, but not necessarily managing services. Is
> the idea that you would need to deploy your own management layer through
> Ensemble, or outside of Ensemble, or is the idea that in the future
> Ensemble will be able to manage services for you?
> 

Well one of the goals for ensemble is that service management should be no
more difficult than the initial deploy. We're still pretty early in the project's
attempts to fufill that goal from a perspective of embracing the full lifecycle
of every service. Understanding what the needs are wrt to what admins are
doing today is key to ensemble implementing it well.

> Thanks for reading!
> 
> Tom
> 

Thanks for the great questions, i hope these answers clarify.

cheers,

Kapil




More information about the Ensemble mailing list