Updating state on agent upgrade

Thu Sep 26 08:30:43 UTC 2013

On Thu, Sep 26, 2013 at 4:46 AM, Andrew Wilkins <
andrew.wilkins at canonical.com> wrote:

> On Thu, Sep 26, 2013 at 3:20 AM, William Reade <
> william.reade at canonical.com> wrote:
>
>>  I think that conceptually, "capability" makes sense for some things
>>>> more than job/role. In particular, "has the ability to manage firewalls"
>>>> seems better expressed as a capability than as a job. However, I don't
>>>> think it's really worthwhile changing code to match. A capability can be
>>>> expressed as a job, even if it's *slightly* awkward. The fact that we're
>>>> giving a machine-agent the job "ManageFirewall" implies that it has that
>>>> capability.
>>>>
>>>
>> It's not so much how we express the fact as *where* we express it. If the
>> only way we store important *environment* properties is by tacking them
>> onto a (particular, special) *machine* I think we're setting ourselves up
>> for trouble.
>>
>
> I think I misunderstood you before. Are you advocating something like this?
>  - When bootstrapping, store all the environment's capabilities into the
> environ doc in state.
>  - AddMachine/InjectMachine is called with a role (say, StateServer, or
> MachineAgent; is it just a boolean flag?), rather than jobs.
>  - The state package will load capabilities from mongo, and translate role
> & capabilities to machine-specific jobs.
>

Yes, I think that covers it.

> Yes, sorry if I wasn't clear about that; I was only suggesting that the
> jobs be added to agent.conf for bootstrapping. Apart from that, jobs will
> come through the API as usual.
> If I'm reading your suggestions correctly, though, it sounds like jobs
> won't go through the bootstrap agent.conf at all. but environment
> capabilities would be.
>

Perfect; and, yes, I think the right language at bootstrap time is
capabilities.

Backwards-incompatible schema upgrades don't necessarily have to be handled
> straight away; if we get the rest of it in place, then state-lockout can be
> guaranteed when everything is behind the API. It might be feasible to do
> all the above list, but in (3) just make the jobs changes in machine docs.
> That should be doable without colliding with other DB connections. Even if
> we went the "jiggery-pokery" route, the jobs need to be updated. One
> problem I see is that someone doing "juju add-machine" with older tools
> could still put a machine into state with an older/incomplete set of jobs
> for the new world. I'll have to think about it some more.
>

OK, I guess it's worth accelerating it a bit; please take a look into it
after tidying up the loose ends on the null provider work. We'll need to
stay careful going forward -- basically we always need to be prepared for
the possibility of pre-upgrade tools hitting state, so the actual DB
changes might not be sensible -- but given that we *know* we need this
structure in place before the weight of the jiggery-pokery becomes
overwhelming it'll be valuable to get it into place in parallel.

One possibility is to *not* insert the jobs at state level, and in the API
server report either jobs+tweaks if only jobs are present, and the result
of roles+envcaps if roles are present. Keeps the dirtiness behind the API,
and leaves us a bit more freedom for a light touch on state, but we still
ofc need to be careful (we'll need to make sure envcaps are in place before
a post-upgrade AddMachine, but that seems pretty easy to deal with...
anything else spring to mind?).

Or, alternatively, we could add a Roles field and infer sensible Roles from
those machines that still have explicit Jobs stored -- basically the same
thing but at state level (without db changes). I *think* that old code
would be relatively unbothered by seeing a machine without any jobs, but
that also needs careful thought. The point is really that some degree of
jiggery-pokery remains, I think, sadly inescapable; but that we have a few
possible routes to explore for how to do it as sanely as possible.

Strong -1 on getting the environ involved directly in this case. Juju
>> should itself be able to discover the capabilities of the environment, and
>> react to them, rather than having the environment control the contents of
>> state. The idea of an EnvironUpgrader *is* itself worthwhile, but it should
>> be used to fix things that are part of the environment, rather than part of
>> state, and I don't think the need is so pressing right now.
>>
>
> How is a capability discovered? I would think at bootstrap time an
> environment would need to register its capabilities into state. So then, as
> new capabilities are made available (e.g. "ability to manage firewall"),
> how does an environment register them on upgrade?
>

I'm agnostic about the precise mechanism, but I think the point is that
some environ client needs to be able to find them out: at bootstrap time
the client does so and writes them into cloud-init, and during subsequent
upgrades we could just *always* finish by running a func that instantiates
the environment, interrogates it for capabilities, and sets them in state.
Then, if we're delaying the role+capability calculation until we hit the
API server, I think that when we restarted everything we'd just
automatically get the right jobs for every machine as all the agent came
back up with the new code.

Cheers
William
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20130926/737ab8ff/attachment.html>