cgroup stanza a proposal
Stéphane Graber
stgraber at ubuntu.com
Wed Nov 20 19:34:55 UTC 2013
On Wed, Nov 20, 2013 at 02:23:59PM -0500, Stéphane Graber wrote:
> This morning at vUDS we discussed adding support for cgroups in Upstart.
>
> Before I go into details about the proposed stanza and overall
> behaviour, I'd begin by saying that contrary to some other init systems,
> our intent is solely related to resource controls which is the main goal
> of cgroups. Process grouping and tracking will remain unaffected by the
> addition of cgroup support.
>
> Cgroup support will be implemented by adding a new "cgroup" stanza which
> will control the application of cgroup based restrictions to the job.
> The limits will be applied to any of the scripts
> (pre-start/post-start/job/pre-stop/post-stob) similar to what's done
> with setuid/setgid/apparmor stanzas.
>
> Now my recommended format for the stanza, which I believe should be
> flexible enough is:
> cgroup <controller> <cgroup name|auto> [<key> <value>]
>
>
> Detail on the fields:
> == controller ==
> Name for one of the cgroup controller
>
> Currently the valid values are (but won't be hardcoded into upstart):
> - blkio
> - cpu
> - cpuacct
> - cpuset
> - devices
> - freezer
> - hugetlb
> - memory
> - perf_event
>
> == cgroup-name|$auto ==
> Name of the cgroup to use (and create if non-existing)
>
> The name may contain a / (e.g. "db/pgsql" or "db/$auto") indicating that
> it's requesting a sub-cgroup.
>
> "$auto" is the recommended name and will have upstart generate a name
> based on the job instance name.
>
> The main use of that field is for cases where a set of jobs should share
> limits, in such case the main job should declare the various values and
> the others just refer to the cgroup by name but not defined values.
>
> The name may be different for the various controllers but may not differ
> within the same controller. Example:
> valid => cgroup memory group1 limit_in_bytes 52428800
> cgroup cpuset group2 cpus 0-1
>
> invalid => cgroup memory group1 limit_in_bytes 52428800
> cgroup memory group1 soft_limit_in_bytes 1024
The invalid entry above is actually valid... What I meant was:
invalid => cgroup memory group1 limit_in_bytes 52428800
cgroup memory group2 soft_limit_in_bytes 1024
Thanks to Serge Hallyn for noticing!
>
> == key ==
> The cgroup control file minus the controller name, so for example
> memory.soft_limit_in_bytes will become limit_in_bytes.
>
> == value ==
> Any value valid for the given control file, upstart itself won't perform
> any validation.
>
> If the value contains spaces, it should be put between double-quotes (e.g.):
> cgroup devices auto allow "c 1:2 rwm"
>
>
> Upstart won't have any controller aware logic in its code, instead,
> it'll simply talk over dbus (using a private dbus socket) to the cgroup
> manager which will take care of applying the various limits.
> That cgroup manager will be started very early in the boot sequence. Any
> job containing a cgroup stanza will be held until the manager is
> started.
>
> The cgroup will be destroyed when a job is stopped and the cgroup isn't
> shared with another job (task count is 0 and it has no child cgroup).
>
> It'll be possible to disable cgroup support entirely by either building
> upstart without it (needed for non-Linux systems) or by passing
> --no-cgroup as a parameter to upstart. In that case, the cgroup stanza
> will simply be ignored and the jobs will start without limitations.
>
>
> All of the above is also meant to apply to user sessions. The cgroup
> manager will allow unprivileged cgroup configuration, so as long as the
> user has write access to a sub-section of a controller, it'll be allowed
> to write entries there. Similarly to other restriction stanzas, failure
> to apply a cgroup limit in a user session won't be fatal.
>
>
> Now a few examples to try and illustrate the thoughts behind that proposal:
>
> == Single job simple example ==
> === Job ===
> cgroup memory $auto limit_in_bytes 52428800
>
> === Result ===
> The job will only start once the manager is up and running and will have a
> 50MB memory limit. If the system has less than 50MB, the job will fail
> to start.
>
> == Single job complex example ==
> === Job ===
> cgroup memory $auto limit_in_bytes 52428800
> cgroup cpuset $auto cpus 0-1
> cgroup blkio slowio throttle.write_bps_device "8:16 1048576"
>
> == Result ==
> The job will only start once the manager is up and running and will have a
> 50MB memory limit, be restricted to CPU ids 0 and 1 and have a 1MB/s
> write limit to the block device 8:16.
> The job will fail to start if the system has less than 50MB of RAM or
> less than 2 CPUs.
>
>
> == Multiple jobs complex example ==
> === Job 1 ===
> cgroup cpuset db cpus 0-1
> cgroup memory db limit_in_bytes 104857600
> cgroup blkio db throttle.write_bps_device "8:16 1048576"
>
> === Job 2 ===
> cgroup cpuset db/$auto cpus 1
> cgroup memory db/$auto limit_in_bytes 52428800
> cgroup blkio db/$auto throttle.write_bps_device "8:17 1048576"
>
> === Job 3 ===
> cgroup cpuset db
> cgroup memory db
>
> === Job 4 ===
> cgroup cpuset db/$auto cpus 2
>
> == Result ==
> This is rather complex, so let's go job by job:
> - Job 1 will start bound to CPU 0 and 1 with a 100MB memory limit and
> 1MB/s write limit to the 8:16 block device. It'll fail to start if
> the system has less than 2 CPUs or less than 100MB of RAM.
>
> - Job 2 will start bound to CPU 1 and with a 50MB memory limit. It'll
> inherit the 1MB/s write limit to 8:16 and on top of that also rate limit
> writes to 8:17 also at 1MB/s.
> The job will fail to start if the system has less than 50MB of RAM or
> less than 2 CPUs.
>
> - Job 3 will start in the "db" cpuset and memory cgroups. If it starts
> before Job 1, no limit will be applied at startup time. As soon as Job 1
> starts however Job 3 will be limited to 2 CPUs and 100MB of memory.
> As it doesn't have a blkio statement, it won't have rate limited I/Os.
>
> - Job 4 if started after Job 1 will fail to start as it's requesting a
> CPU that the parent cgroup doesn't have access to. If started before
> Job 1 however, it won't have a parent value set so will inherit the
> default and so will start so long as the system has at least 3 CPUs.
>
>
>
> I think this pretty much covers all I've got in mind at this point, I
> think the above is flexible enough to work with all existing
> controllers.
>
> Questions, comment and suggestions are much welcome!
>
> --
> Stéphane Graber
> Ubuntu developer
> http://www.ubuntu.com
> --
> upstart-devel mailing list
> upstart-devel at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/upstart-devel
--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/upstart-devel/attachments/20131120/e7ae1cc1/attachment.pgp>
More information about the upstart-devel
mailing list