cgroup stanza a proposal

Serge Hallyn serge.hallyn at ubuntu.com
Thu Nov 21 20:09:03 UTC 2013


Quoting Stéphane Graber (stgraber at ubuntu.com):
> On Thu, Nov 21, 2013 at 01:15:47PM -0600, Serge Hallyn wrote:
> > Quoting Stéphane Graber (stgraber at ubuntu.com):
> > > On Wed, Nov 20, 2013 at 02:23:59PM -0500, Stéphane Graber wrote:
> > > > This morning at vUDS we discussed adding support for cgroups in Upstart.
> > > > 
> > > > Before I go into details about the proposed stanza and overall
> > > > behaviour, I'd begin by saying that contrary to some other init systems,
> > > > our intent is solely related to resource controls which is the main goal
> > > > of cgroups. Process grouping and tracking will remain unaffected by the
> > > > addition of cgroup support.
> > > > 
> > > > Cgroup support will be implemented by adding a new "cgroup" stanza which
> > > > will control the application of cgroup based restrictions to the job.
> > > > The limits will be applied to any of the scripts
> > > > (pre-start/post-start/job/pre-stop/post-stob) similar to what's done
> > > > with setuid/setgid/apparmor stanzas.
> > > > 
> > > > Now my recommended format for the stanza, which I believe should be
> > > > flexible enough is:
> > > >  cgroup <controller> <cgroup name|auto> [<key> <value>]
> > > > 
> > > > 
> > > > Detail on the fields:
> > > > == controller ==
> > > > Name for one of the cgroup controller
> > > > 
> > > > Currently the valid values are (but won't be hardcoded into upstart):
> > > >  - blkio
> > > >  - cpu
> > > >  - cpuacct
> > > >  - cpuset
> > > >  - devices
> > > >  - freezer
> > > >  - hugetlb
> > > >  - memory
> > > >  - perf_event
> > > > 
> > > > == cgroup-name|$auto ==
> > > > Name of the cgroup to use (and create if non-existing)
> > > > 
> > > > The name may contain a / (e.g. "db/pgsql" or "db/$auto") indicating that
> > > > it's requesting a sub-cgroup.
> > > > 
> > > > "$auto" is the recommended name and will have upstart generate a name
> > > > based on the job instance name.
> > > > 
> > > > The main use of that field is for cases where a set of jobs should share
> > > > limits, in such case the main job should declare the various values and
> > > > the others just refer to the cgroup by name but not defined values.
> > > > 
> > > > The name may be different for the various controllers but may not differ
> > > > within the same controller. Example:
> > > > valid =>    cgroup memory group1 limit_in_bytes 52428800
> > > >             cgroup cpuset group2 cpus 0-1
> > > > 
> > > > invalid =>  cgroup memory group1 limit_in_bytes 52428800
> > > >             cgroup memory group1 soft_limit_in_bytes 1024
> > > 
> > > The invalid entry above is actually valid... What I meant was:
> > > 
> > > invalid =>  cgroup memory group1 limit_in_bytes 52428800
> > >             cgroup memory group2 soft_limit_in_bytes 1024
> > > 
> > > Thanks to Serge Hallyn for noticing!
> > > 
> > > > 
> > > > == key ==
> > > > The cgroup control file minus the controller name, so for example
> > > > memory.soft_limit_in_bytes will become limit_in_bytes.
> > 
> > One thing Tejun (kernel cgroups maintainer) has been big on is that
> > userspace should not sit too closely to the implementation, meaning
> > not be relying on the precise cgroup filenames.  Systemd addresses
> > this by completely abstracting things into 'slices'.  lmctfy introduces
> > more generic names, i.e. 'memory {limit: 100000}' instead of
> > memory.limit = 100000.
> > 
> > It may be too early to decide this - but should the key/value pairs
> > be in lmctfy format vs. the current lxc way, which is verbatim
> > filenames and values?
> 
> So I don't think we want upstart to link against lmctfy as we try to
> keep the number of libraries we link against to a bare minimum (for
> obvious reason since we're PID 1 and have to support things like
> stateful re-exec).
> 
> I don't think we want to add a lot of cgroup internals logic to upstart
> either, so unless that kind of abstraction is directly exposed by the
> cgroup manager, I think we'll have to stick to exposing a rather raw
> view of the underlying cgroups.

We shouldn't have to link against lmctfy, but we could still use its
configuration format.  Or, we can build or own, as we'd likely have
to extend lmctfy's format anyway - i.e. lmctfy doesn't know about
blkio and netcls.

Especially since these lines are going into upstart *jobs*, we don't
want to risk upstart jobs specifying invalid keys and having upstart
have to guess what to do with it.

-serge



More information about the upstart-devel mailing list