[Maas-devel] Replacing Apache

Thu Nov 27 15:56:17 UTC 2014

On 27 November 2014 at 14:12, Christian Robottom Reis wrote:
> Hello there,
>
>     One of the projects we'd like to tackle for MAAS 1.8 is replacing
> Apache on the region and cluster. One key driver is simplification, as
> Apache (and mod_wsgi) are a large moving part, but apart from being
> overkill for what we need, it's also a complex dependency for us to set
> up and manage, particularly in environments where Apache is already
> being used for another purpose. This email provides some background on
> the current design and some questions for the new design.
>
> The region:
>
>     AIUI, our use of Apache on the region is to
>
>         a) run Django via mod_wsgi; this is currently configured to run
>            2 single-thread processes by default [1]
>
>         b) serve static content, like JS and CSS
>
>     The Apache-spawned Django processes also run a twisted reactor each.
>     The reactor hosts a number of services, including an RPC service
>     which is set up to listen on a port (a random port, in particular,
>     see https://bugs.launchpad.net/maas/+bug/1352923). By design, any
>     service running here can call out to any cluster, so each process
>     holds a connection open to the cluster.
>
>     Boot images are stored in the DB and served up via Django [2].
>
> The cluster:
>
>     On the cluster, we run a "clusterd" upstart job (which currently
>     spawns a process which AFAIK is still called twistd, maybe bug [3]?)
>     which is what calls via RPC into the region. The job also listens
>     with python-txtftp on port 69 for TFTP, which incidentally requests
>     the PXE config from Django running on the region [4].
>
>     In order to find out which RPC services are available from a region,
>     the cluster controller requests an RPC view (at /MAAS/rpc) which
>     hands out a JSON dump with a map of region services, IP addresses
>     and ports.
>
>     Once the cluster gets the RPC services JSON dump, it establishes a
>     connection with each of the region RPC services. This is necessary
>     so that any region can send messages to any cluster.
>
>     We use Apache to serve boot resources to the nodes for curtin
>     installs. AFAIK debian-installer is purely package-based and doesn't
>     pull any resources in any special way.
>
> Replacing apache:
>
>     - On the region, replacing Apache doesn't appear to have major
>       performance impact.  We are already sending large files to the
>       cluster using Django (i.e. python), so the only throughout benefit
>       Apache gives us is serving up resources like JS and CSS. We will
>       probably need to up the number of running processes to avoid
>       lagging or blocking, which will increase the number of cluster to
>       region connections -- but I don't think that should be a problem.
>
>       One tricky aspect: if the user has other Apache services running
>       on the region, then port 80 will be tied up and we won't be able
>       to use it. We have some options:
>
>         * having the region and cluster connect on another port (yuck)
>         * continue using mod_wsgi for that situation (yuck yuck)
>         * proxying over to Twisted using mod_proxy, which is included in
>           the stock Apache install and is what Gavin
>           has in his lp:~allenap/maas/regiond branch. However, as with
>           the mod_wsgi solution this would require Twisted to be running
>           on another port.
>         * disallow vhosting altogether on the region, which would be
>           visible change in behaviour

There's another tricky aspect: the user may have configured Apache to
use TLS. We'd need to migrate that over to Twisted.

That's why leaving Apache in, but switching it to a very simple
static-files-and-proxy-for-the-rest configuration might be an okay
compromise. We wouldn't need to ask people to restart Apache, we
wouldn't have to ask users to debug using Apache's logs too (or much
much less), but we'd still have native handling for static files, and we
wouldn't rock the TLS boat.

>
>     - On the cluster, though the architecture is simpler, we'd need to
>       be careful as the performance of boot resource delivery to the
>       nodes may be an issue.
>
> Additional thoughts:
>
>     - Currently nodes have to access the region controller because they
>       request information that is only made available via Django. Does
>       anyone not agree that we should have the requests handled by the
>       cluster which would then use the API to contact the region?
>
>     - To address performance and concurrency when sending resources from
>       Twisted, we could look at using python-sendfile (a universe
>       package) which provides zero-copy file sending. Twisted actually
>       has a sendfile implementation proposed as a patch that has been
>       stuck for a few years, but we could nudge that forward as well.
>
>     - There is a bug filed by IS asking us to reduce the number of ports
>       the region listens on. Could we bing all region processes to the
>       same socket using SO_REUSEPORT to avoid this issue?
>
>         http://lwn.net/Articles/542629/
>
>       We'd need to study the semantics, but this seems easier than
>       inverting it so the region connects to the cluster.

Each clusterd needs to connect to each regiond. Having multiple regionds
listening (for RPC) on the same port makes it hit-and-miss for a
clusterd to make all of the connections it needs. It could just keep
trying until it has them all, but that's a bit sucky.

"Hello! Is that regiond-B?"

"No, this is regiond-A, and I'm already talking to you on line 1."

<click>

"Hello! Is that regiond-B?"

"No, this is regiond-A, and I just spoke to you."

<click>

"Hello? ..."

We could turn this around and make each regiond initiate connections to
each clusterd, and that would address the problem... as long as we only
have one clusterd on each cluster controller.

Maybe the "Hello!"-<click>-"Hello!" approach would be okay. I think it
would be easy enough to implement first and try it out.

>
>     - We want to use SSL for communication; Twisted has native SSL
>       support so that shouldn't be an issue.
>
> Your comments and ideas welcome -- some of the above needs to be
> actually validated on a running MAAS, but mine's being upgraded to try
> the isolation changes out, so I'll note later if I left anything
> important out.
>
> [1] Why 2 processes? Why only 1 thread? What drove this decision?
>
> [2] Interestingly, as we only have 2 processes configured by default,
> the fact that image downloads are long-running may allow 2 or more
> clusters to make the region unavailable while images are being
> downloaded. Gavin and Blake should probably discuss this.

I'd like to change this to use Twisted's IO loop.

>
> [3] We could change the process name with something like
> python-setproctitle, a C extension that does some voodoo that ordinary
> Python code cannot. That specific one is in universe.
>
> [4] The cluster also runs tgtd to serve up the ephemeral rootfs, but it's
> a separate process, not handled in clusterd itself.

The iSCSI mounts are configured by something in clusterd.

This is an approach we /could/ take with Apache too (or another web
server like nginx): don't meddle with the configuration in /etc, write
out a configuration from clusterd and spawn it from there. For the
cluster at least it doesn't matter on which port it runs because we
control where nodes look for boot resources.

The sendfile thing may make this unnecessary though.

Gavin.