Joyent networking issues

Menno Smits menno.smits at canonical.com
Sun Dec 14 20:51:33 UTC 2014


On 13 December 2014 at 06:34, Curtis Hovey-Canonical <curtis at canonical.com>
wrote:
>
> Thank you Menno.
>
> On Fri, Dec 12, 2014 at 12:01 AM, Menno Smits <menno.smits at canonical.com>
> wrote:
> > For the last day and a half I've been looking at this bug:
> > https://bugs.launchpad.net/juju-core/+bug/1401130
> >
> > There's a lot of detail attached to the ticket but the short story is
> that
> > the Joyent cloud often allocates different internal networks to
> instances,
> > meaning that they can't communicate. From what I can tell from relevant
> LP
> > tickets, this has been a problem for a long time (perhaps always). It's
> very
> > hit and miss - sometimes you get allocated 10 machines in a row that all
> end
> > up with the same internal network, but more often than not it only takes
> 2
> > or 3 machine additions before running into one that can't talk to the
> > others.
>
> Your analysis explains a lot about the the intermittent failures we
> have observed in Juju CI for months.
> ...
>
> > Given that this is looking like a problem/feature at Joyent's end that
> needs
> > clarification from them, may I suggest that this issue is no longer
> allowed
> > to block CI?
>
> Speaking for users, there is a regression.
>
> ...


>
> We do see intermittent failures using 1.20 in the joyent cloud health
> check. so we know statistically, the problem does exists for every
> juju, but we are seeing 100% failure for master tip. The success rates
> were better for master last week, and the rates for 1.20 and 1.21 are
> great for all weeks.
>

Based on what we're seeing in CI, I'm thinking there are 3 things at play
here:

1. The new networker wasn't playing well with the way the network
configuration files are set up in Joyent images. Dimiter has disabled the
networker on Joyent for now, increasing the chance of success for 1.21 and
master.

2. As discussed throughout this thread, instances can end up on different
internal networks. This is a Joyent issue which can affect any Juju
release. It's just up to chance whether the tests will pass on Joyent in CI
- if one of  instances that is assigned isn't on the same internal network
as the others the test run will fail. Adding a static route for 10.0.0.0/8
should fix this.

3. Some other issue, yet to be determined, is preventing the Joyent tests
from passing on master only. I will start investigating this once the
static route is being added automatically.



>
>
> --
> Curtis Hovey
> Canonical Cloud Development and Operations
> http://launchpad.net/~sinzui
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20141215/6a920f07/attachment.html>


More information about the Juju-dev mailing list