Deployment Oversight

Mon Nov 28 18:44:22 UTC 2016

Merlin,

Thanks for your insight here, and I totally agree with you, "running
everything in LXD containers is a very good starting point" - simply
because we can guarantee that everything works as tested/expected, right?

To the extent of trying to hack lxd/lxc networking, I think a generic
openvswitch l3 networking charm is in order here, although to my
understanding Juju might soon be extended to use the FAN. Could someone
verify whether or not the FAN is set to any milestones yet?

~James

On Mon, Nov 28, 2016 at 9:21 AM, Merlijn Sebrechts <
merlijn.sebrechts at gmail.com> wrote:

> I wouldn't want to be in your shoes in a pre-snappy world... I'm amazed
> that Ubuntu still works so well in the ocean.
>
> We found a way to mitigate most of the issues: run everything exclusively
> in LXC containers. This gave us the standard cloud image that all these
> Charms are being tested on. This approach had two issues:
>
> 1. Network: lxc containers can't connect to containers on other hosts and
> can't resolve each others hostnames. DNS might be a bigger issue than you
> think. Not a single big data framework can handle un-resolvable hostnames.
> 2. Reliability:
>
> - We experienced many crashed state servers on 1.x manual environments.
> - Random failure of the lxc template download[1] (This problem reappeared
> 1 week after closing the issue. We didn't reopen it because we started
> moving to MAAS).
>
> - Random failure of installing lxc packages on the host. At first I
> thought this was due to outdated host images, but this problem was
> intermittent, which doesn't make much sense..
>
> - Fixes for 1. were hard to create, and didn't work reliable.
>
> Getting to the point where lxc was successfully installed and working was
> hard and unreliable. About 1/2 deploys failed. However, once an environment
> got there, it was very pleasant to work with.
>
> What I suggest is that you stop trying to make Juju work in 'the ocean'
> and focus the manual environment efforts on one thing: a multi-machine LXD
> provider. *Fix the LXD networking and DNS issues and tell everyone to
> only use LXD containers in a manual environment.* For many people, the
> manual provider is their starting point into the Juju world, and running
> everything in LXD containers is a very good starting point.
>
> [1] https://bugs.launchpad.net/juju-core/1.25/+bug/1610880
>
>
> 2016-11-28 16:55 GMT+01:00 Mark Shuttleworth <mark at ubuntu.com>:
>
>>
>> Super difficult to document 'the ocean', there will always be fraying at
>> the edges that what worked on clouds fails in the manual case.
>>
>> Mark
>>
>>
>> On 28/11/16 15:49, Rick Harding wrote:
>>
>> That's very true on the items that are different. I wonder if we could
>> work with the CPC team and note the things that are assumed promises when
>> using cloud images so that it'd be easy to build a "patch" for manually
>> provisioned machines. If we know specific packages or configuration is
>> there on our images it should be do-able to help have some sort of
>> "manual-init" script that could try to bring things in line.
>>
>> Merlijn, do you have any notes on the changes that you were suffering
>> through? Was there anything that didn't fit the "using your own ubuntu
>> install vs a CPC certified image"?
>>
>> On Sun, Nov 27, 2016 at 1:26 AM John Meinel <john at arbash-meinel.com>
>> wrote:
>>
>>> From what I can tell, there are a number of places where these manual
>>> machines differ from our "standard" install. I think the charms can be
>>> written defensively around this, but its why you're running into more
>>> issues than you normally would.
>>>
>>>    1. 'noexec' for /tmp. I've heard of this, but as layer-ruby wants to
>>>    build something, where *should* it build something. Maybe we could do
>>>    something in /var, but it does seem like the intermediate files are all
>>>    temporary (thus why someone picked /tmp). I don't have any details on
>>>    layer-ruby
>>>    2. python-yaml not installed. Most of the places where we run juju
>>>    uses 'cloud-init' in order to set up the machine for the first time, and
>>>    I'm pretty sure cloud-init has a dependency on python-yaml (cause its how
>>>    some of the cloud-init config is written). Again, charms can just include
>>>    python-yaml as a dependency, I'm guessing they just didn't notice because
>>>    all the other places they tested it was already there.
>>>
>>> John
>>> =:->
>>>
>>>
>>> On Sun, Nov 27, 2016 at 4:45 AM, Merlijn Sebrechts <
>>> merlijn.sebrechts at gmail.com> wrote:
>>>
>>> I feel you, James
>>>
>>> We've been battling with weird issues / compatibility problems with the
>>> manual provider on private infra for the past year. Just finding out where
>>> the problem is requires diving deep into the internals of Juju and the
>>> Charms. In the end, we patched our own servers heavily and had to patch
>>> ~30% of the Charms we tried. This slowed us down so much that we just gave
>>> up and moved to MAAS. We're having a lot less problems now..
>>>
>>>
>>>
>>> 2016-11-27 0:03 GMT+01:00 James Beedy <jamesbeedy at gmail.com>:
>>>
>>> Was a bit flustered earlier when I sent off this email, I've looked a
>>> bit closer at each of the individual problems, thought I would report back
>>> with my findings.
>>>
>>> 1. Job for systemd-sysctl.service failed because the control process
>>> exited
>>>     - This is an error I'm seeing when installing juju (not sure if this
>>> is adding to any other issues or not), didn't look into it much, but filed
>>> a bug here -> https://bugs.launchpad.net/juju/+bug/1645025
>>>
>>> 2. ERROR juju.state database.go:231 using unknown collection
>>> "remoteApplications"
>>>     - This seems to only exist in 2.0.1, installed from juju/stable ppa,
>>> when I reverted back to 2.0.0, this went away.
>>>
>>> Charm/Layer Issues
>>>
>>> 3. Problem with Ruby: ["env: './configure': Permission denied"]
>>>     - Both of my charms were utilizing layer-ruby. When deployed to lxd,
>>> and EC2, I don't seem to get this error, but deploying on this
>>> private/dedicated infra doesn't like python running `./configure` I feel
>>> (could also be permissions on /tmp, but I tried moving the upacking and
>>> configuring to another dir, and still got this error).
>>>     - Filed bug here -> https://github.com/battlemi
>>> dget/juju-layer-ruby/issues/12
>>>     - Removing layer-ruby was my fix here, this allowed my charms to
>>> deploy w/o error.
>>>
>>> 4.  Elasticsearch
>>>     - Seems the es charm can't find the yaml module (possibly a
>>> python3.5 thing)???
>>>     - Filed bug here -> https://bugs.launchpad.net/
>>> charms/+source/elasticsearch/+bug/1645043
>>>     - My workaround here, just to get the app deployed, was to deploy
>>> elasticsearch to a lxd container on one of my hosts. Of course this isn't
>>> an answer for anything more then POC, but worked to allow me to
>>> deploy/troubleshoot the rest of my bundle.
>>>
>>>
>>> Aside from the remaining elasticsearch issue, I was able to get my stack
>>> deployed -> http://paste.ubuntu.com/23540146/
>>>
>>> My earlier baffled and confused cry for help seems now just revolve
>>> around getting es to deploy.
>>>
>>> My apologies for reaching out in such a way earlier before diving into
>>> what was going on, hopefully we can work out whats going on with my infra
>>> <-> ES.
>>>
>>> Thanks
>>>
>>> --
>>> Juju mailing list
>>> Juju at lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>>> an/listinfo/juju
>>>
>>>
>>>
>>> --
>>> Juju mailing list
>>> Juju at lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>>> an/listinfo/juju
>>>
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev at lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>>> an/listinfo/juju-dev
>>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20161128/7daf1d74/attachment-0001.html>