LXD polish for xenial
Martin Packman
martin.packman at canonical.com
Mon Apr 18 18:17:29 UTC 2016
With the LXD 2.0 release at the start of last week and the prospect of
some stability, I spent a good chunk of the week testing of Juju and
LXD.
What CI has been doing so far this cycle has been running our standard
deployment tests with the lxd provider on a pre-prepared machine with
a known-working package set. My two goals beyond this were:
- Validating the first install experience of the LXD provider on xenial
- Using lxd in place of lxc in containerised workloads across clouds
The conclusion being at present, we don't have an experience that
works for either of these.
For the lxd provider, I understand we're resigned to the user having
to manually configure a bridge for lxd before bootstrap can work.
Currently the documentation is confused as to what exactly the steps
are, the release notes refer to these two links:
<https://linuxcontainers.org/lxd/getting-started-cli/>
<http://insights.ubuntu.com/2016/04/07/lxd-networking-lxdbr0-explained/>
But I think the latest advice is as committed with this change to our
documentation:
<https://github.com/juju/docs/pull/998/files>
Note that just running dpkg-reconfigure is not enough, you have to
poke a service or run `lxc` afterwards or you get this error with
beta4:
$ juju bootstrap --config default-series=xenial lxd-test lxd
ERROR cannot find network interface "lxcbr0": route ip+net: no
such network interface
ERROR invalid config: route ip+net: no such network interface
That's probably the cause of the other confusion in the updated docs -
now we *do* want the bridge named lxdbr0 not lxcbr0. If the user
already has lxc setup in some fashion there's a different error from
juju telling them to run the dpkg-reconfigure command. That should
presumably always appear whenever there's no usable bridge.
This also presents a challenge for automated testing of the lxd
provider in a clean environment, dpkg-reconfigure isn't the nicest
thing to use non-interactively, and I can't find clear reference to
what the exact required pieces are for the juju provider.
As part of the juju 2.0 packaging for Ubuntu, we need an
autopackagetest that will run in a fresh xenial machine, so this
script is what I added to do the lxd configuration:
<http://bazaar.launchpad.net/~juju-qa/ubuntu/xenial/juju/xenial-2.0-beta4/view/head:/debian/tests/setup-lxd.sh>
With the additional step afterwards to call `lxc finger` that works
(with caveats) for me. In the autopkgtest.ubuntu.com infrastructure
however it does not, and it has also failed in two different ways for
Steve Langasek and Martin Pitt:
"autopkgtest lxd provider tests fail for 2.0"
<https://bugs.launchpad.net/ubuntu/+source/juju-core/+bug/1571082>
So, at present we don't have confidence that the LXD provider will
work, even with the manual configuration step, for users installing
Xenial for the first time.
When it comes to using lxd in clouds, as I understand it we've settled
on retaining the 'lxc' and 'lxd' name distinction in 2.0 - which does
mean bundles have to be manually changed at present to start using
lxd. Most of the CI bundle testing is using real bundles out of the
store, which all still say 'lxc' and therefore don't exercise the lxd
container code at all.
We do have the container networking test which uses 'juju add-machine
... lxd:0' - and fails due to the networking setup:
"container networking lxd 'Missing parent for bridged type nic'"
<https://bugs.launchpad.net/juju-core/+bug/1571053>
That is probably less interesting than the default behaviour without
the feature flag.
As a separate test, I updated one of our simple bundles just to say
'lxd' in two places where it had 'lxc' for a service before. The
deployment timed out after 24 minutes, where the normal test with lxc
takes 12 minutes.
The reason for that turns out to be pretty simple. Looking back at the
lxd provider test, it hung for over 20 minutes just updating packages
when setting up the first container:
In container /var/log/apt/history.log
Start-Date: 2016-04-15 22:11:16
...
End-Date: 2016-04-15 22:33:03
Unlike other providers, lxd exposes no way to use the daily images
instead of release images, so at present any machine using lxd
containers with juju for the first time will get the xenial beta2
image then upgrade basically every package. This issue goes away next
week, but gets in the way of testing before then.
In a related note, the lxc container handling in juju manages images
on the state server, but from what I see of the lxd code, each
deployed machine will fetch images from cloud-images.ubuntu.com and
keep a separate set of images. That makes the above problem much worse
for any bundle with multiple machines that use containers.
Finally, we'll need to update the log gathering code in CI to know how
to look inside lxd containers. At the moment, only the machine log
seems to be linked into the /var/log/lxd/ directory, so the cloud-init
logs and other pieces are currently missing. It does seem we can peek
inside using paths like:
/var/lib/lxd/containers/juju-d9c2c426-f268-47d9-8b96-4468b3f60b51-machine-0/rootfs/var/log/apt/term.log
But I'm not sure if that's behaviour we can rely on with all lxd configurations.
Martin
More information about the Juju-dev
mailing list