[Maas-devel] Replacing Apache
Christian Robottom Reis
kiko at canonical.com
Thu Nov 27 14:12:51 UTC 2014
Hello there,
One of the projects we'd like to tackle for MAAS 1.8 is replacing
Apache on the region and cluster. One key driver is simplification, as
Apache (and mod_wsgi) are a large moving part, but apart from being
overkill for what we need, it's also a complex dependency for us to set
up and manage, particularly in environments where Apache is already
being used for another purpose. This email provides some background on
the current design and some questions for the new design.
The region:
AIUI, our use of Apache on the region is to
a) run Django via mod_wsgi; this is currently configured to run
2 single-thread processes by default [1]
b) serve static content, like JS and CSS
The Apache-spawned Django processes also run a twisted reactor each.
The reactor hosts a number of services, including an RPC service
which is set up to listen on a port (a random port, in particular,
see https://bugs.launchpad.net/maas/+bug/1352923). By design, any
service running here can call out to any cluster, so each process
holds a connection open to the cluster.
Boot images are stored in the DB and served up via Django [2].
The cluster:
On the cluster, we run a "clusterd" upstart job (which currently
spawns a process which AFAIK is still called twistd, maybe bug [3]?)
which is what calls via RPC into the region. The job also listens
with python-txtftp on port 69 for TFTP, which incidentally requests
the PXE config from Django running on the region [4].
In order to find out which RPC services are available from a region,
the cluster controller requests an RPC view (at /MAAS/rpc) which
hands out a JSON dump with a map of region services, IP addresses
and ports.
Once the cluster gets the RPC services JSON dump, it establishes a
connection with each of the region RPC services. This is necessary
so that any region can send messages to any cluster.
We use Apache to serve boot resources to the nodes for curtin
installs. AFAIK debian-installer is purely package-based and doesn't
pull any resources in any special way.
Replacing apache:
- On the region, replacing Apache doesn't appear to have major
performance impact. We are already sending large files to the
cluster using Django (i.e. python), so the only throughout benefit
Apache gives us is serving up resources like JS and CSS. We will
probably need to up the number of running processes to avoid
lagging or blocking, which will increase the number of cluster to
region connections -- but I don't think that should be a problem.
One tricky aspect: if the user has other Apache services running
on the region, then port 80 will be tied up and we won't be able
to use it. We have some options:
* having the region and cluster connect on another port (yuck)
* continue using mod_wsgi for that situation (yuck yuck)
* proxying over to Twisted using mod_proxy, which is included in
the stock Apache install and is what Gavin
has in his lp:~allenap/maas/regiond branch. However, as with
the mod_wsgi solution this would require Twisted to be running
on another port.
* disallow vhosting altogether on the region, which would be
visible change in behaviour
- On the cluster, though the architecture is simpler, we'd need to
be careful as the performance of boot resource delivery to the
nodes may be an issue.
Additional thoughts:
- Currently nodes have to access the region controller because they
request information that is only made available via Django. Does
anyone not agree that we should have the requests handled by the
cluster which would then use the API to contact the region?
- To address performance and concurrency when sending resources from
Twisted, we could look at using python-sendfile (a universe
package) which provides zero-copy file sending. Twisted actually
has a sendfile implementation proposed as a patch that has been
stuck for a few years, but we could nudge that forward as well.
- There is a bug filed by IS asking us to reduce the number of ports
the region listens on. Could we bing all region processes to the
same socket using SO_REUSEPORT to avoid this issue?
http://lwn.net/Articles/542629/
We'd need to study the semantics, but this seems easier than
inverting it so the region connects to the cluster.
- We want to use SSL for communication; Twisted has native SSL
support so that shouldn't be an issue.
Your comments and ideas welcome -- some of the above needs to be
actually validated on a running MAAS, but mine's being upgraded to try
the isolation changes out, so I'll note later if I left anything
important out.
[1] Why 2 processes? Why only 1 thread? What drove this decision?
[2] Interestingly, as we only have 2 processes configured by default,
the fact that image downloads are long-running may allow 2 or more
clusters to make the region unavailable while images are being
downloaded. Gavin and Blake should probably discuss this.
[3] We could change the process name with something like
python-setproctitle, a C extension that does some voodoo that ordinary
Python code cannot. That specific one is in universe.
[4] The cluster also runs tgtd to serve up the ephemeral rootfs, but it's
a separate process, not handled in clusterd itself.
--
Christian Robottom Reis | [+1] 612 888 4935 | http://launchpad.net/~kiko
Canonical VP Hyperscale | [+55 16] 9 9112 6430 | http://async.com.br/~kiko
More information about the Maas-devel
mailing list