cgroup stanza a proposal

Stéphane Graber stgraber at ubuntu.com
Fri Nov 29 18:06:46 UTC 2013


Hello everyone,

I have now published the Cgroup specification on the Upstart wiki:
http://upstart.ubuntu.com/wiki/Cgroup

This is based on my original proposal with the changes suggested on the
mailing list.

On Wed, Nov 20, 2013 at 02:23:59PM -0500, Stéphane Graber wrote:
> This morning at vUDS we discussed adding support for cgroups in Upstart.
> 
> Before I go into details about the proposed stanza and overall
> behaviour, I'd begin by saying that contrary to some other init systems,
> our intent is solely related to resource controls which is the main goal
> of cgroups. Process grouping and tracking will remain unaffected by the
> addition of cgroup support.
> 
> Cgroup support will be implemented by adding a new "cgroup" stanza which
> will control the application of cgroup based restrictions to the job.
> The limits will be applied to any of the scripts
> (pre-start/post-start/job/pre-stop/post-stob) similar to what's done
> with setuid/setgid/apparmor stanzas.
> 
> Now my recommended format for the stanza, which I believe should be
> flexible enough is:
>  cgroup <controller> <cgroup name|auto> [<key> <value>]
> 
> 
> Detail on the fields:
> == controller ==
> Name for one of the cgroup controller
> 
> Currently the valid values are (but won't be hardcoded into upstart):
>  - blkio
>  - cpu
>  - cpuacct
>  - cpuset
>  - devices
>  - freezer
>  - hugetlb
>  - memory
>  - perf_event
> 
> == cgroup-name|$auto ==
> Name of the cgroup to use (and create if non-existing)
> 
> The name may contain a / (e.g. "db/pgsql" or "db/$auto") indicating that
> it's requesting a sub-cgroup.
> 
> "$auto" is the recommended name and will have upstart generate a name
> based on the job instance name.
> 
> The main use of that field is for cases where a set of jobs should share
> limits, in such case the main job should declare the various values and
> the others just refer to the cgroup by name but not defined values.
> 
> The name may be different for the various controllers but may not differ
> within the same controller. Example:
> valid =>    cgroup memory group1 limit_in_bytes 52428800
>             cgroup cpuset group2 cpus 0-1
> 
> invalid =>  cgroup memory group1 limit_in_bytes 52428800
>             cgroup memory group1 soft_limit_in_bytes 1024
> 
> == key ==
> The cgroup control file minus the controller name, so for example
> memory.soft_limit_in_bytes will become limit_in_bytes.
> 
> == value ==
> Any value valid for the given control file, upstart itself won't perform
> any validation.
> 
> If the value contains spaces, it should be put between double-quotes (e.g.):
> cgroup devices auto allow "c 1:2 rwm"
> 
> 
> Upstart won't have any controller aware logic in its code, instead,
> it'll simply talk over dbus (using a private dbus socket) to the cgroup
> manager which will take care of applying the various limits.
> That cgroup manager will be started very early in the boot sequence. Any
> job containing a cgroup stanza will be held until the manager is
> started.
> 
> The cgroup will be destroyed when a job is stopped and the cgroup isn't
> shared with another job (task count is 0 and it has no child cgroup).
> 
> It'll be possible to disable cgroup support entirely by either building
> upstart without it (needed for non-Linux systems) or by passing
> --no-cgroup as a parameter to upstart. In that case, the cgroup stanza
> will simply be ignored and the jobs will start without limitations.
> 
> 
> All of the above is also meant to apply to user sessions. The cgroup
> manager will allow unprivileged cgroup configuration, so as long as the
> user has write access to a sub-section of a controller, it'll be allowed
> to write entries there. Similarly to other restriction stanzas, failure
> to apply a cgroup limit in a user session won't be fatal.
> 
> 
> Now a few examples to try and illustrate the thoughts behind that proposal:
> 
> == Single job simple example ==
> === Job ===
> cgroup memory $auto limit_in_bytes 52428800
> 
> === Result ===
> The job will only start once the manager is up and running and will have a
> 50MB memory limit. If the system has less than 50MB, the job will fail
> to start.
> 
> == Single job complex example ==
> === Job ===
> cgroup memory $auto limit_in_bytes 52428800
> cgroup cpuset $auto cpus 0-1
> cgroup blkio slowio throttle.write_bps_device "8:16 1048576"
> 
> == Result ==
> The job will only start once the manager is up and running and will have a
> 50MB memory limit, be restricted to CPU ids 0 and 1 and have a 1MB/s
> write limit to the block device 8:16.
> The job will fail to start if the system has less than 50MB of RAM or
> less than 2 CPUs.
> 
> 
> == Multiple jobs complex example ==
> === Job 1 ===
> cgroup cpuset db cpus 0-1
> cgroup memory db limit_in_bytes 104857600
> cgroup blkio db throttle.write_bps_device "8:16 1048576"
> 
> === Job 2 ===
> cgroup cpuset db/$auto cpus 1
> cgroup memory db/$auto limit_in_bytes 52428800
> cgroup blkio db/$auto throttle.write_bps_device "8:17 1048576"
> 
> === Job 3 ===
> cgroup cpuset db
> cgroup memory db
> 
> === Job 4 ===
> cgroup cpuset db/$auto cpus 2
> 
> == Result ==
> This is rather complex, so let's go job by job:
>  - Job 1 will start bound to CPU 0 and 1 with a 100MB memory limit and
>    1MB/s write limit to the 8:16 block device. It'll fail to start if
>    the system has less than 2 CPUs or less than 100MB of RAM.
> 
>  - Job 2 will start bound to CPU 1 and with a 50MB memory limit. It'll
>    inherit the 1MB/s write limit to 8:16 and on top of that also rate limit
>    writes to 8:17 also at 1MB/s.
>    The job will fail to start if the system has less than 50MB of RAM or
>    less than 2 CPUs.
> 
>  - Job 3 will start in the "db" cpuset and memory cgroups. If it starts
>    before Job 1, no limit will be applied at startup time. As soon as Job 1
>    starts however Job 3 will be limited to 2 CPUs and 100MB of memory.
>    As it doesn't have a blkio statement, it won't have rate limited I/Os.
> 
>  - Job 4 if started after Job 1 will fail to start as it's requesting a
>    CPU that the parent cgroup doesn't have access to. If started before
>    Job 1 however, it won't have a parent value set so will inherit the
>    default and so will start so long as the system has at least 3 CPUs.
> 
> 
> 
> I think this pretty much covers all I've got in mind at this point, I
> think the above is flexible enough to work with all existing
> controllers.
> 
> Questions, comment and suggestions are much welcome!
> 
> -- 
> Stéphane Graber
> Ubuntu developer
> http://www.ubuntu.com



> -- 
> upstart-devel mailing list
> upstart-devel at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/upstart-devel


-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/upstart-devel/attachments/20131129/51ff9c25/attachment.pgp>


More information about the upstart-devel mailing list