cgroup stanza a proposal

Stéphane Graber stgraber at ubuntu.com
Thu Nov 21 20:51:51 UTC 2013


On Thu, Nov 21, 2013 at 02:09:03PM -0600, Serge Hallyn wrote:
> Quoting Stéphane Graber (stgraber at ubuntu.com):
> > On Thu, Nov 21, 2013 at 01:15:47PM -0600, Serge Hallyn wrote:
> > > Quoting Stéphane Graber (stgraber at ubuntu.com):
> > > > On Wed, Nov 20, 2013 at 02:23:59PM -0500, Stéphane Graber wrote:
> > > > > This morning at vUDS we discussed adding support for cgroups in Upstart.
> > > > > 
> > > > > Before I go into details about the proposed stanza and overall
> > > > > behaviour, I'd begin by saying that contrary to some other init systems,
> > > > > our intent is solely related to resource controls which is the main goal
> > > > > of cgroups. Process grouping and tracking will remain unaffected by the
> > > > > addition of cgroup support.
> > > > > 
> > > > > Cgroup support will be implemented by adding a new "cgroup" stanza which
> > > > > will control the application of cgroup based restrictions to the job.
> > > > > The limits will be applied to any of the scripts
> > > > > (pre-start/post-start/job/pre-stop/post-stob) similar to what's done
> > > > > with setuid/setgid/apparmor stanzas.
> > > > > 
> > > > > Now my recommended format for the stanza, which I believe should be
> > > > > flexible enough is:
> > > > >  cgroup <controller> <cgroup name|auto> [<key> <value>]
> > > > > 
> > > > > 
> > > > > Detail on the fields:
> > > > > == controller ==
> > > > > Name for one of the cgroup controller
> > > > > 
> > > > > Currently the valid values are (but won't be hardcoded into upstart):
> > > > >  - blkio
> > > > >  - cpu
> > > > >  - cpuacct
> > > > >  - cpuset
> > > > >  - devices
> > > > >  - freezer
> > > > >  - hugetlb
> > > > >  - memory
> > > > >  - perf_event
> > > > > 
> > > > > == cgroup-name|$auto ==
> > > > > Name of the cgroup to use (and create if non-existing)
> > > > > 
> > > > > The name may contain a / (e.g. "db/pgsql" or "db/$auto") indicating that
> > > > > it's requesting a sub-cgroup.
> > > > > 
> > > > > "$auto" is the recommended name and will have upstart generate a name
> > > > > based on the job instance name.
> > > > > 
> > > > > The main use of that field is for cases where a set of jobs should share
> > > > > limits, in such case the main job should declare the various values and
> > > > > the others just refer to the cgroup by name but not defined values.
> > > > > 
> > > > > The name may be different for the various controllers but may not differ
> > > > > within the same controller. Example:
> > > > > valid =>    cgroup memory group1 limit_in_bytes 52428800
> > > > >             cgroup cpuset group2 cpus 0-1
> > > > > 
> > > > > invalid =>  cgroup memory group1 limit_in_bytes 52428800
> > > > >             cgroup memory group1 soft_limit_in_bytes 1024
> > > > 
> > > > The invalid entry above is actually valid... What I meant was:
> > > > 
> > > > invalid =>  cgroup memory group1 limit_in_bytes 52428800
> > > >             cgroup memory group2 soft_limit_in_bytes 1024
> > > > 
> > > > Thanks to Serge Hallyn for noticing!
> > > > 
> > > > > 
> > > > > == key ==
> > > > > The cgroup control file minus the controller name, so for example
> > > > > memory.soft_limit_in_bytes will become limit_in_bytes.
> > > 
> > > One thing Tejun (kernel cgroups maintainer) has been big on is that
> > > userspace should not sit too closely to the implementation, meaning
> > > not be relying on the precise cgroup filenames.  Systemd addresses
> > > this by completely abstracting things into 'slices'.  lmctfy introduces
> > > more generic names, i.e. 'memory {limit: 100000}' instead of
> > > memory.limit = 100000.
> > > 
> > > It may be too early to decide this - but should the key/value pairs
> > > be in lmctfy format vs. the current lxc way, which is verbatim
> > > filenames and values?
> > 
> > So I don't think we want upstart to link against lmctfy as we try to
> > keep the number of libraries we link against to a bare minimum (for
> > obvious reason since we're PID 1 and have to support things like
> > stateful re-exec).
> > 
> > I don't think we want to add a lot of cgroup internals logic to upstart
> > either, so unless that kind of abstraction is directly exposed by the
> > cgroup manager, I think we'll have to stick to exposing a rather raw
> > view of the underlying cgroups.
> 
> We shouldn't have to link against lmctfy, but we could still use its
> configuration format.  Or, we can build or own, as we'd likely have
> to extend lmctfy's format anyway - i.e. lmctfy doesn't know about
> blkio and netcls.
> 
> Especially since these lines are going into upstart *jobs*, we don't
> want to risk upstart jobs specifying invalid keys and having upstart
> have to guess what to do with it.
> 
> -serge

If we have an alternate, nicer way of setting limits through the dbus
API, I have no problem with using that in Upstart.

(I also think we'd then need _raw functions exported for those who just
want to write or read a value directly from cgroupfs bypassing the
abstractions).

I however want to avoid using an external library for that or adding
cgroup specific logic into Upstart.

-- 
Stéphane Graber
Ubuntu developer
http://www.ubuntu.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://lists.ubuntu.com/archives/upstart-devel/attachments/20131121/8701f8fc/attachment-0001.pgp>


More information about the upstart-devel mailing list