[RFC] Syntax Proposal for Seccomp Filters in Upstart

Fri Dec 14 14:35:01 UTC 2012

Hi David,

On 12/12/12 23:27, David Gaarenstroom wrote:
> Starting with the Ubuntu 12.04 LTS kernel and the mainline 3.5 the
> kernel security feature "Secure Computing" (seccomp) has been revamped
> into a BPF filtering mechanism that allows processes to install a
> filter definition for system calls. Individual system calls can be
> allowed, denied (killing the offensive program), or bypassed while
> setting errno to a configurable value. ("trap" and "trace" are also
> supported as action types, see the kernel docs for details on these.)
> Such a seccomp filter not only restricts its process, but also any new
> child processes.  As long as a process either runs with root
> privileges or sets the prctl "NO_NEW_PRIVS" it can install a seccomp
> filter. (Setting NO_NEW_PRIVS implies that you can't gain new
> privileges, not even through suid binaries such as sudo or ping)
> 
> As this filtering can be used to sandbox (child)processes, it is very
> suitable for Upstart, and since there appears to be no implementation
> yet, I'd like to propose a syntax for Upstart jobs. I've already
> implemented a draft into my own 1.5 based Upstart and I'd like to work
> towards getting it added to the Upstart trunk. (I figure it will take
> at least a few iterations.)
> 
> Systemd already has an implementation for Seccomp filtering using
> "SystemCallFilter" (see:
> http://0pointer.de/public/systemd-man/systemd.exec.html ) For Upstart
> I'd like to stay relatively close to Systemd's syntax, making it
> easier to write policies for both upstart and Systemd. However, I'd
> like to extend its syntax a bit by adding an optional policy for every
> syscall listed. I'm thinking about the following syntax:
> 
> seccomp filter
>     : "seccomp-filter" WS [ '~' ] seccomp_rules;
> 
> seccomp_rules
>     : seccomp_rule ( ',' seccomp_rule )*;
> 
> seccomp_rule
>     : systemcall ( ':' policy )?;
> 
> policy
>     : "allow"
>     | "errno" ( '(' errno ')' )?
>     | "kill"
>     | "trace"
>     | "trap" ( '(' errno ')' )?
>     ;
> 
> errno
>     : NUMBER
I think we should avoid allowing a literal here as it reduces portability if
jobs get copied between different platforms. I appreciate that 'kill signal'
also allows a numeric, but that is not documented and should probably be
deprecated for the same reason.

>     | errno_identifier
>     ;
> 
> The default policy is "allow explicitly listed syscalls as default
> policy, and use the kill policy for anything not explicitly listed".
> That is, unless the set of rules is preceded with "~" which reverts
> this policy, just like Systemd does. (deny explicitly listed syscalls
> as default policy, allow anything not explicitly listed")
> 
> E.g.:
>   seccomp-filter write
> 
> ...for "echo hello world".
> or:
> 
>   seccomp-filter getrlimit:allow,setrlimit:errno(EACCES)

It's worth noting that systemd's syntax does not use commas to separate rules
and if we adopt such a delimiter it would be the first Upstart stanza to permit
a comma. Not a problem per se, but just highlighting this and given the parsing
routines that Upstart uses (and which would need to be used for the seccomp
syntax), it might make sense to drop the comma and effectively enforce spacing
such that your example would become:

seccomp-filter getrlimit:allow setrlimit:errno(EACCES)

The parser can already handle long lines which can be broken using a trailing
backslash if a long list of syscalls is specified.

> 
> ...for a fictional program that is allowed to call getrlimit, but
> calls to setrlimit are simply ignored and errno is set to EACCES.
> or:
> 
>   seccomp-filter ~setuid, socket
> 
> ...to prevent the usage of setuid and socket
> 
> 
> I would really like to know if there's any objection to this syntax,
> as my (first) Upstart patches will be based on this syntax. If anyone
> is interested, most of this syntax and functionality is already
> implemented in a seccomp exec wrapper I wrote, which can be found
> here:
> https://gitorious.org/guardian/guardian

I wonder if it would make sense for the seccomp handling code itself to be put
into a library since, as I understand it if all the code was in Upstart, that
would necessitate putting out new Upstart releases just to add support for a new
Linux system call? If so, could guardian be changed in this respect? Or is
libseccomp a better fit (it's also already in the Debian+Ubuntu archives).

Other thoughts:

- What happens in a scenario where multiple system calls have been specified,
but only a subset are available on the platform the job is running on?

Presumably, the only safe option would be to fail to start the job. If so, we'd
need to find a way to notify the admin as to why the job is not starting (which
could just be to document that when first developing/testing a new job using the
seccomp-filter, ensure Upstart is in debug mode).

- What if the syscall is known, but cannot be filtered on by the currently
running kernel? (running back-level kernel with newer libc / seccomp libs, or
running in a chroot environment)

- What would this do?

 seccomp-filter ~ setuid:allow

> 
> Further references:
> https://lwn.net/Articles/498231/
> http://kernelnewbies.org/Linux_3.5#head-c48d6a7a26b6aae95139358285eee012d6212b9e
> http://outflux.net/teach-seccomp/
> http://0pointer.de/public/systemd-man/systemd.exec.html
> 
> 
> Best regards,
> David Gaarenstroom
> 

Kind regards,

James.
--
James Hunt
____________________________________
http://upstart.ubuntu.com/cookbook
http://upstart.ubuntu.com/cookbook/upstart_cookbook.pdf