cgroup stanza a proposal
Stéphane Graber
stgraber at ubuntu.com
Thu Nov 21 20:49:09 UTC 2013
On Thu, Nov 21, 2013 at 12:26:19PM -0800, Steve Langasek wrote:
> Hi Stéphane,
>
> On Wed, Nov 20, 2013 at 02:23:59PM -0500, Stéphane Graber wrote:
> > This morning at vUDS we discussed adding support for cgroups in Upstart.
> >
> > Before I go into details about the proposed stanza and overall
> > behaviour, I'd begin by saying that contrary to some other init systems,
> > our intent is solely related to resource controls which is the main goal
> > of cgroups. Process grouping and tracking will remain unaffected by the
> > addition of cgroup support.
> >
> > Cgroup support will be implemented by adding a new "cgroup" stanza which
> > will control the application of cgroup based restrictions to the job.
> > The limits will be applied to any of the scripts
> > (pre-start/post-start/job/pre-stop/post-stob) similar to what's done
> > with setuid/setgid/apparmor stanzas.
> >
> > Now my recommended format for the stanza, which I believe should be
> > flexible enough is:
> > cgroup <controller> <cgroup name|auto> [<key> <value>]
> >
> >
> > Detail on the fields:
> > == controller ==
> > Name for one of the cgroup controller
> >
> > Currently the valid values are (but won't be hardcoded into upstart):
> > - blkio
> > - cpu
> > - cpuacct
> > - cpuset
> > - devices
> > - freezer
> > - hugetlb
> > - memory
> > - perf_event
> >
> > == cgroup-name|$auto ==
> > Name of the cgroup to use (and create if non-existing)
> >
> > The name may contain a / (e.g. "db/pgsql" or "db/$auto") indicating that
> > it's requesting a sub-cgroup.
> >
> > "$auto" is the recommended name and will have upstart generate a name
> > based on the job instance name.
> >
> > The main use of that field is for cases where a set of jobs should share
> > limits, in such case the main job should declare the various values and
> > the others just refer to the cgroup by name but not defined values.
> >
> > The name may be different for the various controllers but may not differ
> > within the same controller. Example:
> > valid => cgroup memory group1 limit_in_bytes 52428800
> > cgroup cpuset group2 cpus 0-1
> >
> > invalid => cgroup memory group1 limit_in_bytes 52428800
> > cgroup memory group1 soft_limit_in_bytes 1024
> >
> > == key ==
> > The cgroup control file minus the controller name, so for example
> > memory.soft_limit_in_bytes will become limit_in_bytes.
>
> FWIW, typo here as well, this should of course be 'soft_limit_in_bytes' in
> both cases.
Indeed, thanks for spotting it.
> > == value ==
> > Any value valid for the given control file, upstart itself won't perform
> > any validation.
>
> > If the value contains spaces, it should be put between double-quotes (e.g.):
> > cgroup devices auto allow "c 1:2 rwm"
>
>
> > Upstart won't have any controller aware logic in its code, instead,
> > it'll simply talk over dbus (using a private dbus socket) to the cgroup
> > manager which will take care of applying the various limits.
> > That cgroup manager will be started very early in the boot sequence. Any
> > job containing a cgroup stanza will be held until the manager is
> > started.
>
> > The cgroup will be destroyed when a job is stopped and the cgroup isn't
> > shared with another job (task count is 0 and it has no child cgroup).
>
> Is upstart responsible for destroying the cgroup, or is this done by the
> cgroup manager?
My initial thought was that Upstart would take care of it, so once it
thinks all jobs using a given cgroup are stopped, it'll ask the manager
to destroy the cgroup.
Serge mentioned that the cgroup manager may also be set as a release
agent which would let it automatically remove cgroups when the last task
exits, but I don't think we should use that capability with upstart as
it'd mean restarting a job would tear down the cgroup and set it up
again, which may not be what all our users want (if for whatever reason
they have changed the restrictions after the job started).
> > All of the above is also meant to apply to user sessions. The cgroup
> > manager will allow unprivileged cgroup configuration, so as long as the
> > user has write access to a sub-section of a controller, it'll be allowed
> > to write entries there. Similarly to other restriction stanzas, failure
> > to apply a cgroup limit in a user session won't be fatal.
>
> Seems to leave open the question of how users are given access to the
> subsection. From what we discussed in the UDS session, I believe we expect
> logind to set this up for us, correct?
Correct, I'm currently testing enabling all the controllers in our
default logind config. In theory just setting that single option should
be enough to have user writable cgroups setup at login time. The users
will then be able to alter that cgroup as they want through the cgroup
manager.
> Also, the implicit corollary to "cgroup limit failures in a user session
> aren't fatal" is that, in a system job, they *are* fatal. I know you know
> this, but it should be documented explicitly. :)
Correct. I mentioned this a few times in the examples I provided and is
in line with the behaviour of other limit stanzas, but we should indeed
make sure to have this documented in the handbook whenever we land the
new feature.
> Do you intend this writeup to live under http://upstart.ubuntu.com/wiki/ for
> reference?
I can do that early next week to give more time for comments on my proposal.
I expect most of what I wrote to end up in a way or another in the
handbook once we release the feature in Upstart.
--
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/upstart-devel/attachments/20131121/327d6562/attachment.pgp>
More information about the upstart-devel
mailing list