Juju induction sprint summary

Michael Foord michael.foord at canonical.com
Tue Jul 15 16:49:03 UTC 2014


On 14/07/14 09:43, Ian Booth wrote:
> Hi all
>
> So last week we had a Juju induction sprint for Tanzanite and Moonstone teams to
> welcome Eric and Katherine to the Juju fold. Following is a summary of some key
> outcomes from the sprint that are relevant to others working on Juju (we also
> did other stuff not generally applicable for this email). Some items will
> interest some folks, while others may not quite be so relevant to you, so scan
> the topics to see what you find interesting.
>
> * Architectural overview - and a cool new tool
>
> The sprint started with an architectural overview of the Juju moving parts and
> how they interacted to deploy and maintain a Juju environment. Katherine noted
> that our in-tree documentation has lots of text and no diagrams. She pointed out
> a great tool for easily putting together UML diagrams using a simple text based
> syntax - Plant UML http://plantuml.sourceforge.net. Check it out, it's pretty
> cool. We'll be adding a diagram or two to the in-tree docs to show how it works.
>
> * Code review (replacement for Github's native code review)
>
> We are going to use Review Board. When we first looked at it before the sprint,
> a major show stopper was lack of an auth plugin which worked with Github. Eric
> has stepped up and written the necessary plugin. We'll have something deployed
> this week or early next week, once some more tooling to finish the Github
> integration is done. The key features:
> - Login with Github button on main login screen
> - pull requests automatically imported to Review Board and added to review queue
> - diffs can be uploaded to Review Board as WIP and submitted to Github when
> finalised
>
> * Fixing the Juju state.State mess
>
> state is a mess of layering violations and intermingled concerns. The result is
> slow and fragile unit tests, scalability issues, hard to understand code, code
> which is difficult to extend and refactor (to name a few issues).
>
> The correct layering should be something like:
> * remote service interface (aka apiserver)
> * juju services for managing machines, services, units etc
> * juju domain model
> * model persistence (aka state)
>
> The persistence layer above is all that should be in the state package. The plan
> is to incrementally extract Juju service business logic out of state and pull it
> up into a services layer. The first to be done is the machine placement and
> deployment logic. Wayne has a WIP branch for this. The benefit of this work
> can't be overstated, and the sprint allowed both teams to be able to work
> together to understand the direction and intent of the work.
>
> * Mongo 2.6 support
>
> The work to port Juju to Mongo 2.6 is pretty much complete. The newer Mongo
> version offers a number of bug fixes and  improvements over the 2.4 series, and
> we need to be able to run with an up-to-date version.
>
> * Providers don't need to have a storage implementation (almost)
>
> A significant chunk of old code which was to support agents connecting directly
> to mongo was removed (along with the necessary refactoring). This then allowed
> the Environ interface to drop the StateInfo() method and instead implement a
> method which returns the state server instances (not committed yet but close).
> The next step is to remove the Storage() interface from Environ and make storage
> an internal implementation detail which is not mandatory, so long as providers
> have a way to figure out their state servers (this can be done using tagging for
> example).
>
> * Juju 1.20.1 release (aka juju/mongo issues)
>
> A number of issues with how Juju and mongo interact became apparent when
> replicasets were used for HA. Unfortunately Juju 1.20 shipped with these issues
> unfixed. Part of the sprint was spent working on some urgent fixes to ship a bug
> fix 1.20.1 release. There's still an outstanding mongo session issue that needs
> to be fixed this week for a 1.20.2 release. Michael is working on it. The tl;dr
> is that we are holding onto sessions and not refreshing, which means that the
> underlying socket can time out and Juju loses connection to mongo.

The specific bug here is to do with i/o timeout errors:

     https://bugs.launchpad.net/juju-core/+bug/1307434

It looks very likely that the cause of this is due to session timeout / 
connection problems caused by us using a single global session for all 
communication with mongo.

This global session permeates everywhere. Everything that has an 
mgo.Collection holds an indirect reference to this session and uses it. 
This includes watchers, state.State and the transaction runner we use 
for executing transactions.

mgo has socket pooling built into it, but to use that we need to be 
copying (and closing) sessions rather than executing queries off mgo 
collections or using the session directly.

Unpicking this is a fair amount of work. The current issue I have is 
that when I copy the session I immediately get auth errors. Caused (I 
assume) by us changing the connection credentials after we create the 
master session. So whenever we change credentials we also need to update 
the master session. Then *everywhere* that uses mongo needs to be 
changed to use a new session. Anything left using the original session 
is at risk of timeout.

We may *also* need to build in some resilience to connection failures 
(and we don't want to just mask real problems with retries either).

This all needs some thought as to how it should look, particularly in 
light of the proper separation of persistence layer and domain objects 
that is underway.

All the best,

Michael


>
> * Add support for Juju in China for Amazon (almost)
>
> The supported regions for the EC2 provider are hard coded and so new regions in
> China were not supported. The Chinese regions also use a new signing algorithm.
> There should be a fix in place this week. Since all the changes are in the goamz
> library, the change to juju-core is merely a dependency update. So this feature
> should be available in the 1.20.2 release.
>
> All up, a productive sprint with some great collaboration between the two teams.
>
>
>
>
>




More information about the Juju-dev mailing list