Juju on MAAS agent tools upgrade mechanism

Mon Sep 14 01:10:36 UTC 2015

On 12/09/15 23:49, Peter Grandi wrote:
> Apologies for the late reply, I spent most the time in between
> reverse engineering some issues with other ("hipsterish")
> clusterized services.
> 
> In the meantime I have written and just now uploaded my own
> draft overview of how Juju is structured, at a very high level:
> 
>   https://wiki.ubuntu.com/ServerTeam/JujuConcepts
> 
> Needs to be update it a bit with some of the information below.

The wiki article looks great in general, especially since I think you've done it
by observing how Juju runs There's a few conceptual missteps in some of the
information. To help clarify, may I also recommend as a good overview of Juju:
http://blog.labix.org/2013/06/25/the-heart-of-juju

> 
>> Each machine in a Juju environment runs a jujud binary. The
>> binary is packaged in the so-called tools tarball.
> 
> I seem to have noticed 'jujud' is per-unit rather than
> per-machine, but then I think that there is a very noticeable
> bias of Juju development towards one-unit-per-node on dynamic
> public "cloud" providers... :-)
> 

Correct. Right now, a machine has several jujud services - one for the machine
and one for each unit deployed to that machine. We're hoping to get the time to
consolidate this so that each node has a single jujud agent to manage all of the
workloads on that machine.

The recommended deployment model is indeed one unit per node, but bear in mind a
node may be a container. So to achieve density, a host machine may run multiple
units, each hosted inside an LXC container for example.

>> The bootstrap process needs to download the tools from
>> somewhere to the initial Juju Server.
> 

Either that or the tools can be provided to the bootstrap command from a local
directory; this will upload the tools to the Juju Server. The --metadata-source
argument to bootstrap is the thing to use.

> That would be I guess the Juju "controller" machine, which is
> not necessarily any of the MongoDB repset.
> 

The Juju Servers (what you call the controller above) do correspond to the
MongoDB replicaset machines. A Juju deployment may use only one Juju Server
(also hosting MongoDB) but this is not HA. In an HA scenario, extra Juju Server
machines are added, each running a MongoDB replicaset instance. Any Juju Server
may receive API requests from a Juju node; the MongoDB primary runs on one of
the Servers.

>> For deployments with internet access, the tools come from
>> https://streams.canonical.com/juju/tools/. This is the
>> simplest case and doesn't require any agent-metadata-url or
>> sync-tools usage. I may have missed it in your emails, but I'm
>> assuming your environment does have internet access?
> 
> The Juju controller, the Juju state machines and the Juju nodes
> I am dealing with all have Internet access. They are on various
> private subnets, but the Juju controller also has a public
> address, and the others are NAT'ed.
> 

In that case, no sync-tools or any other setup is needed. A simple juju
bootstrap will pull down the tools and cache in the Juju Server's blobstore.
When new nodes are added, the tools come from the Juju Server. The only time
tools are fetched again from the internet is when an upgrade is done.

>> [ ... ]  we now store charms and tools in the environment
>> blobstore.
> 
> Thanks for the details!
> 
>> So the above is for bootstrap.
> 
> So far so good, and it looks like bootstrap worked around May
> this year.
> 
>> For upgrades, if the machines in your environment have
>> internet access, then juju upgrade-juju --version=1.24.5
>> should just work.
> 
> That's a bit vague. though. I would run 'juju upgrade-juju' on
> the control node, which has got 1.24.5 and then "somehow" the
> ~70 units with 'jujud' 1.23.3 deployed on the local 12 nodes
> would then download the '.tgz' for their architecture of version
> 1.24.5, but that does not happen and I got instead the error
> message "ERROR no matching tools available" which seems to be
> coming from the 'juju' command running on the control node
> itself.
> 

You typically run juju upgrade-juju on a client machine. I recommend always
using the --version argument to avoid surprises. The algorithm is essentially:
- upgrade command figures out what version to upgrade to [1]
- upgrade command writes an environment setting with the requested version
- Juju machine agents notice the new version request and download the tools to
their nodes
- each agent on the nodes restarts in order to run with the jujud afforded by
the new tools

[1] the algorithm used to figure out the version of tools to upgrade to is
essentially X+1 but it depends on client version and currently running version
in the environment. You can sometimes see a "no tools available"  message but
there needs to be much better UX in this area to explain why the tools version
could not be automatically determined etc. There have been bugs raised and fixed
eg http://pad.lv/1459093 but it's on ongoing are of improvement.
It's best just to be explicit (insert comments about testing explicit version
compatibility in your own environment and then upgrading to the known, tested
version etc).

> The reason I am looking for details is to know a bit in advanced
> at to what 'strace' or where to 'tcpdump' to see what is
> actually broken.
> 
> This quote tells me that at least in some cases it is the unit's
> 'jujud' that fetches its own replacement:
> 
>   >> http://askubuntu.com/questions/555281/ubuntu-maas-juju-bootstrap-stuck-on-fetching-tools
>   >> Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools.tar.gz 'https://streams.canonical.com/juju/tools/releases/juju-1.20.11-trusty-amd64.tgz'
> 
> and in that example is that it is fetched direct from the
> "simplestream" at the Canonical site.
> 

The information above is out of date.
The jujud agent responsible for managing the node (also known as the machine
agent) is responsible for ensuring tools are available on the machine during an
upgrade. It does this by asking the Juju Server for the requested tools. The
Juju Server looks in its cache and only goes to streams.canonical.com if the
tools aren't cached locally. Once the machine agent has fetched tools for that
machine, the unit agents do not also fetch tools; they use what the machine
agent has just obtained.

>> $ juju sync-tools --version 1.24
> 
> Ahhh spotted that the ".5" is not used here. Indeed I add it
> indeed does not work. Unfortunately right now the Juju setup is
> in an unfavourable state because of a MAAS upgrade issue.
> 
>> Once the above is run, the upgrade command should be able to
>> find the latest 1.24 tools in the Juju Server blobstore.
> 
> I'll have a look later, I have a present problem with MAAS that
> is blocking me, about to send another email about that.
> 

sync-tools is totally unnecessary because you said your environment has internet
access.
You only need sync-tools for isolated environments where it is necessary to load
the tools manually into the Juju Server's tools cache.