[Maas-devel] Replacing Apache

Christian Robottom Reis kiko at canonical.com
Thu Nov 27 14:12:51 UTC 2014


Hello there,

    One of the projects we'd like to tackle for MAAS 1.8 is replacing
Apache on the region and cluster. One key driver is simplification, as
Apache (and mod_wsgi) are a large moving part, but apart from being
overkill for what we need, it's also a complex dependency for us to set
up and manage, particularly in environments where Apache is already
being used for another purpose. This email provides some background on
the current design and some questions for the new design.

The region:

    AIUI, our use of Apache on the region is to

        a) run Django via mod_wsgi; this is currently configured to run
           2 single-thread processes by default [1]

        b) serve static content, like JS and CSS

    The Apache-spawned Django processes also run a twisted reactor each.
    The reactor hosts a number of services, including an RPC service
    which is set up to listen on a port (a random port, in particular,
    see https://bugs.launchpad.net/maas/+bug/1352923). By design, any
    service running here can call out to any cluster, so each process
    holds a connection open to the cluster.

    Boot images are stored in the DB and served up via Django [2].

The cluster:

    On the cluster, we run a "clusterd" upstart job (which currently
    spawns a process which AFAIK is still called twistd, maybe bug [3]?)
    which is what calls via RPC into the region. The job also listens
    with python-txtftp on port 69 for TFTP, which incidentally requests
    the PXE config from Django running on the region [4].

    In order to find out which RPC services are available from a region,
    the cluster controller requests an RPC view (at /MAAS/rpc) which
    hands out a JSON dump with a map of region services, IP addresses
    and ports.

    Once the cluster gets the RPC services JSON dump, it establishes a
    connection with each of the region RPC services. This is necessary
    so that any region can send messages to any cluster.

    We use Apache to serve boot resources to the nodes for curtin
    installs. AFAIK debian-installer is purely package-based and doesn't
    pull any resources in any special way.

Replacing apache:

    - On the region, replacing Apache doesn't appear to have major
      performance impact.  We are already sending large files to the
      cluster using Django (i.e. python), so the only throughout benefit
      Apache gives us is serving up resources like JS and CSS. We will
      probably need to up the number of running processes to avoid
      lagging or blocking, which will increase the number of cluster to
      region connections -- but I don't think that should be a problem.

      One tricky aspect: if the user has other Apache services running
      on the region, then port 80 will be tied up and we won't be able
      to use it. We have some options:

        * having the region and cluster connect on another port (yuck)
        * continue using mod_wsgi for that situation (yuck yuck)
        * proxying over to Twisted using mod_proxy, which is included in
          the stock Apache install and is what Gavin
          has in his lp:~allenap/maas/regiond branch. However, as with
          the mod_wsgi solution this would require Twisted to be running
          on another port.
        * disallow vhosting altogether on the region, which would be
          visible change in behaviour

    - On the cluster, though the architecture is simpler, we'd need to
      be careful as the performance of boot resource delivery to the
      nodes may be an issue.

Additional thoughts:

    - Currently nodes have to access the region controller because they
      request information that is only made available via Django. Does
      anyone not agree that we should have the requests handled by the
      cluster which would then use the API to contact the region?

    - To address performance and concurrency when sending resources from
      Twisted, we could look at using python-sendfile (a universe
      package) which provides zero-copy file sending. Twisted actually
      has a sendfile implementation proposed as a patch that has been
      stuck for a few years, but we could nudge that forward as well.

    - There is a bug filed by IS asking us to reduce the number of ports
      the region listens on. Could we bing all region processes to the
      same socket using SO_REUSEPORT to avoid this issue?

        http://lwn.net/Articles/542629/

      We'd need to study the semantics, but this seems easier than
      inverting it so the region connects to the cluster.

    - We want to use SSL for communication; Twisted has native SSL
      support so that shouldn't be an issue.

Your comments and ideas welcome -- some of the above needs to be
actually validated on a running MAAS, but mine's being upgraded to try
the isolation changes out, so I'll note later if I left anything
important out.

[1] Why 2 processes? Why only 1 thread? What drove this decision?

[2] Interestingly, as we only have 2 processes configured by default,
the fact that image downloads are long-running may allow 2 or more
clusters to make the region unavailable while images are being
downloaded. Gavin and Blake should probably discuss this.

[3] We could change the process name with something like
python-setproctitle, a C extension that does some voodoo that ordinary
Python code cannot. That specific one is in universe.

[4] The cluster also runs tgtd to serve up the ephemeral rootfs, but it's
a separate process, not handled in clusterd itself.
-- 
Christian Robottom Reis   | [+1] 612 888 4935    | http://launchpad.net/~kiko
Canonical VP Hyperscale   | [+55 16] 9 9112 6430 | http://async.com.br/~kiko




More information about the Maas-devel mailing list