[RFC] Syntax Proposal for Seccomp Filters in Upstart

James Hunt james.hunt at ubuntu.com
Mon Dec 17 17:34:44 UTC 2012


On 14/12/12 23:34, David Gaarenstroom wrote:
>>> [...]
>>> I'm thinking about the following syntax:
>>>
>>> seccomp filter
>>>     : "seccomp-filter" WS [ '~' ] seccomp_rules;
>>>
>>> seccomp_rules
>>>     : seccomp_rule ( ',' seccomp_rule )*;
>>>
>>> seccomp_rule
>>>     : systemcall ( ':' policy )?;
>>>
>>> policy
>>>     : "allow"
>>>     | "errno" ( '(' errno ')' )?
>>>     | "kill"
>>>     | "trace"
>>>     | "trap" ( '(' errno ')' )?
>>>     ;
>>>
>>> errno
>>>     : NUMBER
>>>     | errno_identifier
>>>     ;
>> I think we should avoid allowing a literal here as it reduces portability if
>> jobs get copied between different platforms. I appreciate that 'kill signal'
>> also allows a numeric, but that is not documented and should probably be
>> deprecated for the same reason.
> 
> Actually, only the errno and trap policy support a numeric/errno
> according to my syntax description. Do you mean you want to avoid
> using e.g. EACCES and only allow an integer value? That seems less
> portable to me... A pair of Makefile rules I'd like to add will
> automatically generate a list of ERRNO's a perfect hashtable for them
> using gperf. I'm using the same for syscalls, inspired by Systemd.
I think that allowing the specification of an errno as a numeric rather than a
symbol should be avoided if possible. So, 'EACCESS' is fine, but allowing '13'
to represent nominally the same value is potentially ambiguous.

> 
> 
>> It's worth noting that systemd's syntax does not use commas to separate rules
>> and if we adopt such a delimiter it would be the first Upstart stanza to permit
>> a comma.
> 
> My wrong, I meant spaces, but I copied it from the guardian syntax,
> which is a command-line utility and therefore uses a comma-separated
> list. As mentioned, I'd like to stay close to systemd's syntax and
> that would imply space separated.
> 
> 
>> I wonder if it would make sense for the seccomp handling code itself to be put
>> into a library since, as I understand it if all the code was in Upstart, that
>> would necessitate putting out new Upstart releases just to add support for a new
>> Linux system call?
> 
> Hmm yes, systemd will have the same problem. Systemd even has a bigger
> problem in that case, systemd uses a bitfield for all systemcalls it
> is aware of and either kills or allows these sytemcalls. Any unknown
> systemcall will be killed implicitly and that cannot be prevented
> unless a newer systemd is installed. That's not what I would do,  a
> ruleset starting with "~" shall always allow unknown syscalls, just
> like it describes (a catchall "allow" policy). Of course besides a
> syscall literal, an integer could be supported as well for the syscall
> part, that way there's at least a workaround.
> 
> But do note that such a new syscall would first have to actually be
> *used* by an (updated) application before it would make sense to
> update upstart to enforce a policy on that new syscall...
> 
> 
>> If so, could guardian be changed in this respect? Or is
>> libseccomp a better fit (it's also already in the Debian+Ubuntu archives).
> 
> I wrote guardian to get accustomed to Seccomp mode 2 and some ideas I
> had and combined with strace it is great to test out some policy rules
> for different applications.
> But I did not intend to use it by upstart, I only intend to reuse
> (most of) the "install_seccomp_filter" code.
> 
> Generating a BPF filter is so trivial that I didn't bother to actually
> look at libseccomp for this purpose. I have looked at it recently, but
> it didn't offer any features I missed. The only good reason would be
> that it can be updated separately from upstart...
> 
> 
>> Other thoughts:
>>
>> - What happens in a scenario where multiple system calls have been specified,
>> but only a subset are available on the platform the job is running on?
> 
> During compile-time a list of syscalls and a list of errno's on the
> target platform is generated, so everything that is available at that
> time for that platform should then be available. Listing a syscall
> unknown to Upstart should trigger a warning and be ignored.
> Syscall-numbers that don't actually have a syscall connected to it,
> could be inserted in the seccomp ruleset but that rule will not be
> triggered.
> 
> This is how I generate these lists at compile time for Guardian (for
> upstart I'd like to do that similar):
> ----
> errno-list.txt:
> 	$(CC) -E -dM -include errno.h -xc /dev/null | $(AWK) '/^#define[
> \t]+E[^ \t]+[ \t]+/ { print $$2; }' > $@ || rm $@
> 
> syscall-list.txt:
> 	$(CC) -E -dM -include sys/syscall.h -xc /dev/null | $(AWK)
> '/^#define[ \t]+__NR_[^ ]+[ \t]+/ { sub(/__NR_/, "", $$2); print $$2;
> }' > $@ || rm $@
> 
> errno-from-name.gperf: errno-list.txt
> 	$(AWK) 'BEGIN{ print "struct errno_name { const char* name; int id;
> };"; print "%null-strings"; print "%%";} { printf "%s, %s\n", $$1, $$1
> }' < $< > $@
> 
> syscall-from-name.gperf: syscall-list.txt
> 	$(AWK) 'BEGIN{ print "struct syscall_name { const char* name; int id;
> };"; print "%null-strings"; print "%%";} { printf "%s, __NR_%s\n",
> $$1, $$1 }' < $< > $@
> 
> errno-from-name.h: errno-from-name.gperf
> 	$(GPERF) -L ANSI-C -t -N lookup_errno -H errno_syscall_name -C -E < $< > $@
> 
> syscall-from-name.h: syscall-from-name.gperf
> 	$(GPERF) -L ANSI-C -t -N lookup_syscall -H hash_syscall_name -C -E < $< > $@
> 
> ----
>> Presumably, the only safe option would be to fail to start the job. If so, we'd
>> need to find a way to notify the admin as to why the job is not starting (which
>> could just be to document that when first developing/testing a new job using the
>> seccomp-filter, ensure Upstart is in debug mode).
>>
>> - What if the syscall is known, but cannot be filtered on by the currently
>> running kernel? (running back-level kernel with newer libc / seccomp libs, or
>> running in a chroot environment)
> 
> I'm not sure I understand your question correctly, but a syscall is
> called via a syscall-number (__NR_<syscall>). Any syscall known has
> been assigned a number. (For x86, the syscall number is assigned to
> %eax and then "int 0x80" is issued and that is how the syscall
> interface works...) If that number is unknown to the kernel, the
> kernel will react to that in some way, with or without seccomp (I
> guess setting errno to ENOSYS and return with -1). Either way, the
> seccomp BPF rules will just compare the syscall-number to the numbers
> in the ruleset and act accordingly... Does that answer your question
> or could you rephrase it...?

Upstart supports running jobs from within a chroot environment. So imagine a
system running Upstart 1.7 but which hosts a chroot that uses Upstart 1.8. The
host environment only has knowledge of the system calls that the actual running
kernel provides (ie libc and the kernel version and Upstart's seccomp handling
are all in sync). However, in the chroot, jobs may attempt to use a newer
syscall than is provided by the running kernel (since they may have been
developed on a non-chroot system with a newer kernel).

So it sounds like you're proposing that the seccomp layer will identify the
unknown syscall and discard it?

> 
> 
>> - What would this do?
>>
>>  seccomp-filter ~ setuid:allow
> 
> setuid would be explicitly allowed, and all other calls implicitly.
> (And for completeness, NO_NEW_PRIVS is possibly set depending on the
> process-user and a no-new-privs stanza.)
> 
> BTW, I noticed my syntax missed something to control setting
> NO_NEW_PRIVS (if running a job as root). But how should I define/parse
> a boolean value for a no-new-privs stanza? Looking quickly through
> parse_job.c I couldn't find an existing example...)
There aren't any existing booleans that spring to mind; the 'task' stanza might
be the best example since it's TRUE if specified, FALSE if not :-)

> 
> 
> Kind regards,
> David Gaarenstroom
> 


Kind regards,

James.
--
James Hunt
____________________________________
http://upstart.ubuntu.com/cookbook
http://upstart.ubuntu.com/cookbook/upstart_cookbook.pdf



More information about the upstart-devel mailing list