Current handling of failed upgrades is screwy

Wed Jul 16 10:32:20 UTC 2014

On 16 July 2014 18:55, William Reade <william.reade at canonical.com> wrote:

> On Wed, Jul 16, 2014 at 3:46 AM, Menno Smits <menno.smits at canonical.com>
> wrote:
>
>> OK - points taken.
>>
>> So taking your ideas and extending them a little, I'm thinking:
>>
>>    - retry upgrade steps on failure (with inter-attempt delay)
>>    - indicate when there's upgrade problems by setting the machine agent
>>    status
>>    - if despite the retries the upgrade won't complete, report this in
>>    status and keep the agent running but with the restricted API in place and
>>    most workers not complete (i.e. as if the upgrade is still running). This
>>    allows "juju status" and "juju ssh" to work unless there's a significant
>>    upgrade step that hasn't run that prevents them from working.
>>
>> Does that sound reasonable?
>>
>
> I think that's reasonable, yeah; only caveat is that if we *are* in a
> position to roll back the whole upgrade we should do so. For these
> measures, I'm thinking more about situations where the state server has
> successfully upgraded but some satellite agents have failed.
>

Absolutely. The above only applies when there is no backup available to
roll back to - i.e. when the version being upgraded from didn't support
storing backups server side. If there is a backup the machine agent will
roll back to the previous version if the upgrade process fails.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140716/e3d5beea/attachment.html>