cgroup stanza a proposal
Stéphane Graber
stgraber at ubuntu.com
Wed Nov 20 19:23:59 UTC 2013
This morning at vUDS we discussed adding support for cgroups in Upstart.
Before I go into details about the proposed stanza and overall
behaviour, I'd begin by saying that contrary to some other init systems,
our intent is solely related to resource controls which is the main goal
of cgroups. Process grouping and tracking will remain unaffected by the
addition of cgroup support.
Cgroup support will be implemented by adding a new "cgroup" stanza which
will control the application of cgroup based restrictions to the job.
The limits will be applied to any of the scripts
(pre-start/post-start/job/pre-stop/post-stob) similar to what's done
with setuid/setgid/apparmor stanzas.
Now my recommended format for the stanza, which I believe should be
flexible enough is:
cgroup <controller> <cgroup name|auto> [<key> <value>]
Detail on the fields:
== controller ==
Name for one of the cgroup controller
Currently the valid values are (but won't be hardcoded into upstart):
- blkio
- cpu
- cpuacct
- cpuset
- devices
- freezer
- hugetlb
- memory
- perf_event
== cgroup-name|$auto ==
Name of the cgroup to use (and create if non-existing)
The name may contain a / (e.g. "db/pgsql" or "db/$auto") indicating that
it's requesting a sub-cgroup.
"$auto" is the recommended name and will have upstart generate a name
based on the job instance name.
The main use of that field is for cases where a set of jobs should share
limits, in such case the main job should declare the various values and
the others just refer to the cgroup by name but not defined values.
The name may be different for the various controllers but may not differ
within the same controller. Example:
valid => cgroup memory group1 limit_in_bytes 52428800
cgroup cpuset group2 cpus 0-1
invalid => cgroup memory group1 limit_in_bytes 52428800
cgroup memory group1 soft_limit_in_bytes 1024
== key ==
The cgroup control file minus the controller name, so for example
memory.soft_limit_in_bytes will become limit_in_bytes.
== value ==
Any value valid for the given control file, upstart itself won't perform
any validation.
If the value contains spaces, it should be put between double-quotes (e.g.):
cgroup devices auto allow "c 1:2 rwm"
Upstart won't have any controller aware logic in its code, instead,
it'll simply talk over dbus (using a private dbus socket) to the cgroup
manager which will take care of applying the various limits.
That cgroup manager will be started very early in the boot sequence. Any
job containing a cgroup stanza will be held until the manager is
started.
The cgroup will be destroyed when a job is stopped and the cgroup isn't
shared with another job (task count is 0 and it has no child cgroup).
It'll be possible to disable cgroup support entirely by either building
upstart without it (needed for non-Linux systems) or by passing
--no-cgroup as a parameter to upstart. In that case, the cgroup stanza
will simply be ignored and the jobs will start without limitations.
All of the above is also meant to apply to user sessions. The cgroup
manager will allow unprivileged cgroup configuration, so as long as the
user has write access to a sub-section of a controller, it'll be allowed
to write entries there. Similarly to other restriction stanzas, failure
to apply a cgroup limit in a user session won't be fatal.
Now a few examples to try and illustrate the thoughts behind that proposal:
== Single job simple example ==
=== Job ===
cgroup memory $auto limit_in_bytes 52428800
=== Result ===
The job will only start once the manager is up and running and will have a
50MB memory limit. If the system has less than 50MB, the job will fail
to start.
== Single job complex example ==
=== Job ===
cgroup memory $auto limit_in_bytes 52428800
cgroup cpuset $auto cpus 0-1
cgroup blkio slowio throttle.write_bps_device "8:16 1048576"
== Result ==
The job will only start once the manager is up and running and will have a
50MB memory limit, be restricted to CPU ids 0 and 1 and have a 1MB/s
write limit to the block device 8:16.
The job will fail to start if the system has less than 50MB of RAM or
less than 2 CPUs.
== Multiple jobs complex example ==
=== Job 1 ===
cgroup cpuset db cpus 0-1
cgroup memory db limit_in_bytes 104857600
cgroup blkio db throttle.write_bps_device "8:16 1048576"
=== Job 2 ===
cgroup cpuset db/$auto cpus 1
cgroup memory db/$auto limit_in_bytes 52428800
cgroup blkio db/$auto throttle.write_bps_device "8:17 1048576"
=== Job 3 ===
cgroup cpuset db
cgroup memory db
=== Job 4 ===
cgroup cpuset db/$auto cpus 2
== Result ==
This is rather complex, so let's go job by job:
- Job 1 will start bound to CPU 0 and 1 with a 100MB memory limit and
1MB/s write limit to the 8:16 block device. It'll fail to start if
the system has less than 2 CPUs or less than 100MB of RAM.
- Job 2 will start bound to CPU 1 and with a 50MB memory limit. It'll
inherit the 1MB/s write limit to 8:16 and on top of that also rate limit
writes to 8:17 also at 1MB/s.
The job will fail to start if the system has less than 50MB of RAM or
less than 2 CPUs.
- Job 3 will start in the "db" cpuset and memory cgroups. If it starts
before Job 1, no limit will be applied at startup time. As soon as Job 1
starts however Job 3 will be limited to 2 CPUs and 100MB of memory.
As it doesn't have a blkio statement, it won't have rate limited I/Os.
- Job 4 if started after Job 1 will fail to start as it's requesting a
CPU that the parent cgroup doesn't have access to. If started before
Job 1 however, it won't have a parent value set so will inherit the
default and so will start so long as the system has at least 3 CPUs.
I think this pretty much covers all I've got in mind at this point, I
think the above is flexible enough to work with all existing
controllers.
Questions, comment and suggestions are much welcome!
--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/upstart-devel/attachments/20131120/9abbea79/attachment.pgp>
More information about the upstart-devel
mailing list