Better handling of MongoDB disconnects due to new replicaset members

Reed O'Brien reed.obrien at canonical.com
Tue Jul 26 02:42:46 UTC 2016


On Mon, Jul 25, 2016 at 5:38 PM, Menno Smits <menno.smits at canonical.com>
wrote:

> Regarding https://bugs.launchpad.net/juju-core/+bug/1597601 ...
>
> When "juju enable-ha" is used, new controller machines are started, each
> running a mongod instance which is connected to Juju's replicaset. As each
> new node joins the replicaset a MongoDB leader election is triggered which
> causes all mongod instances in the replicaset to drop their connections
> (this is by design). The workers in the Juju's machine agents handle this
> correctly by aborting and restarting with fresh connections to MongoDB.
>
> The problem is that if an API request comes in at just the right time, it
> will be actioned just as the MongoDB connection goes down, resulting in the
> i/o timeout error being reported back to the client.
>
> This isn't a new problem but it's one that Juju's users regularly run in
> to. A workaround is to wait for the new controller machines to come up
> after enable-ha is issued before doing anything else.
>
> IMHO it would be best if Juju could hide all this from the client as much
> as possible but I'm really not sure if that's feasible or what the best
> approach should be.
>
> The challenge is that unless we do some major rearchitecting, the API
> server needs to be restarted when the MongoDB connections drop. There's no
> way to that the client's connection can stay up, making it difficult to
> hide this detail from the client.
>

It seems that mgo could handle this as a failover. Or that we could see
that the replica set is starting and wait until it reports being up, then
refresh the mgo session. I don't understand why the API server itself has
to restart, though I am sure there are good reasons.


>
> The most practical solution I can think of is that we introduce a new
> error type over the API which means "please retry the request". Errors such
> as an i/o timeout from the MongoDB layer could be converted into this
> error. Clients would obviously have to handle this error specially.
>

Barring handling it via mgo session this seems obvious and practical.


~ro

-- 
Reed O'Brien
✉ reed.obrien at canonical.com
✆ 415-562-6797
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20160725/6295bc92/attachment.html>


More information about the Juju-dev mailing list