Fwd: How to make juju aware of IP address changes?
roger.peppe at canonical.com
Fri Dec 6 16:04:14 UTC 2013
On 5 December 2013 18:52, Mark Shuttleworth <mark at ubuntu.com> wrote:
> On 04/12/13 17:34, Peter Waller wrote:
>> This situation is now resolved with thanks to Roger, Gustavo and
>> others in real time. There is no way we could have resolved it
>> ourselves since there was corruption of the juju database caused by
>> running out of disk space, which was unfortunate. We as a team were
>> not aware that it is necessary to keep a backup of the juju database.
> Thanks for letting us dive in on it together, Peter.
> Would it help if Juju could maintain an awareness of the disk situation
> and gracefully avoid making the problem worse (and avoid corruption) by
> going read-only when disk is low?
That's an interesting idea. It would need careful thought though - how
would we make the decision when the database is actually distributed
over several machines? I believe that the corruption was caused
by the fact that we were not making sure that mongo journal writes
are synced to disk before returning from a database operation.
If we can avoid corruption by enabling that safety mode, I think
that would probably be preferable.
The main problem in this case was that one problem caused a cascade
of sub-problems (the above corruption occurring quite late in the chain).
The principal issue was the fact that log files expanded incredibly rapidly.
I think that there are a few things that could help here,
most important points first:
- We should limit agent restarting in some way (exponential backoff or
retry limits or both)
- We should rotate log files and compress old ones.
- We should have kind of policy for expiring and deleting old log files.
- We should have some way of garbage collecting the transaction log.
We *could* consider disabling logging when the disk is tending
towards full, but I suspect that could make a bad problem worse
by losing any possibility of seeing what has actually been going
An awareness of the disk situation could help towards deciding
when some of the above actions might be triggered.
More information about the Juju