Machine agents uninstall themselves upon worker.ErrTerminateAgent.
William Reade
william.reade at canonical.com
Mon May 9 06:28:36 UTC 2016
On Mon, May 9, 2016 at 3:28 AM, Andrew Wilkins <andrew.wilkins at canonical.com
> wrote:
> On Sat, May 7, 2016 at 1:37 AM William Reade <william.reade at canonical.com>
> wrote:
>
>> On Fri, May 6, 2016 at 5:50 PM, Eric Snow <eric.snow at canonical.com>
>> wrote:
>>
>>> See https://bugs.launchpad.net/juju-core/+bug/1514874.
>>
>>
> So I think this issue is fixed in 2.0, but looks like the changes never
> got backported to 1.25. From your options, we do have (the opposite of) a
> DO_NOT_UNINSTALL file (it's actually called
> "/var/lib/juju/uninstall-agent"; only if it exists do we uninstall).
>
> (And now that I think of it, we're only writing uninstall-agent for the
> manual provider's bootstrap machine, and not other manual machines, so
> we're currently leaving Juju bits behind on manual machines added to an
> environment.)
>
Except we're *also* writing it on every machine, for Very Bad Reasons,
right? So we *are* still cleaning up all machines, but there's a latent
manual provider bug that'll need addressing.
> The reason it's done at the last moment is to avoid having dangling
> database entries. If we uninstall the agent (i.e. delete /var/lib/juju,
> remove systemd/upstart), then if the agent fails before we get to
> EnsureDead, then the entity will never be removed from state.
>
The *only* thing that should happen after setting dead is the uninstall --
anything else that's required to happen before cleanup *must* happen before
setting dead, which *means* "all my responsibilities are 100% fulfilled".
The *only* justification for the post-death logic in the manual case is
because there's no responsible provisioner component to hand over to -- and
frankly I wish we'd just written that to SSH in and clean up, instead of
taking on this ongoing hassle.
As an alternative, we could (should) only ever write the
> /var/lib/juju/uninstall-agent file from worker/machiner, first checking
> there's no assigned units, and no storage attached.
>
Why would we *ever* want to write it at runtime? We know if it's a manual
machine at provisioning time, so we can write the File Of Death OAOO. All
the other mucking about with it is the source of these (serious!) bugs.
Andrew, I think you had more detail last time we discussed this: is there
>> anything else in uninstall (besides loop-device stuff) that needs to run
>> *anywhere* except a manual machine? and, what will we actually need to sync
>> with in the machiner? (or, do you have alternative ideas?)
>>
>
> No, I don't think there is anything else to be done in uninstall, apart
> from loop detach and manual machine cleanup. I'm not sure about moving the
> uninstall logic to the machiner, for reasons described above. We could
> improve the current state of affairs, though, by only writing the
> uninstall-agent file from the machiner
>
Strong -1 on moving uninstall logic: if it has to happen (which it does, in
*rare* cases that are *always* detectable pre-provisioning), uninstall is
where it should happen, post-machine-death; and also strong -1 on writing
uninstall-agent in *any* circumstances except manual machine provisioning,
we have had *way* too many problems with this "clever" feature being
invoked when it shouldn't be.
FWIW, the loop stuff can be dropped when the LXC container support is
> removed. Nobody ever added support for loop in the LXD provider, and I
> think we should implement support for it differently to how it was done for
> LXC anyway (losetup on host, expose to container; as opposed to expose all
> loop devices to all LXD containers and losetup in container).
>
+1000 to that. So... can't we just (1) fix the manual provisioning to write
the file; (2) drop all other use of uninstall-agent; (3) drop the
lxc-specific logic in uninstall -- and then we're done?
Cheers
William
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20160509/6f372396/attachment-0001.html>
More information about the Juju-dev
mailing list