Part 2! Request for help / ideas to debug issue
michael.hudson at canonical.com
Mon Mar 13 21:05:05 UTC 2017
On 14 March 2017 at 01:59, John Lenton <john.lenton at canonical.com> wrote:
> This one is slightly more interesting.
> You need 1.8 (or patched <1.8 as per the previous thread) for this one
> to make sense; without it you're just going to get drowned in warning
> messages and not see the real issue.
> This one is the real issue :-)
> In go, when calling syscall.Exec to a setuid root binary, sometimes
> (about 4% of the times, on my machine, but it's hardware- and
> load-dependent), the exec'ed process will find itself running with
> effective uid different to zero. That is, a setuid root binary will
> find itself running as non-root. As the process that sets up
> confinement is setuid root (in distros where setuid is favoured over
> capabilities), this means the snap app falls on its face.
> TODO: check if something similar happens when using caps
> This is *probably* a bug in Go, but it only seems to arise when using
> syscall.Exec, which as far as I can tell is unsupported (the whole
> syscall package is unsupported -- not covered by the go1 compatibility
> promise -- and its replacement, golang.org/x/sys/unix, is ominously
> missing Exec).
> Having said that, it might be a bug in the kernel ;-)
> And I say this because if you pin the process to a single cpu, the
> issue doesn't arise.
> Anyway, code to repro this is at
> on my machine,
> $ for i in `seq -w 9999`; do ./a_c; done | wc -l
> $ for i in `seq -w 9999`; do ./a_go; done | wc -l
> $ for i in `seq -w 9999`; do taskset 2 ./a_go; done | wc -l
That's pretty exciting. I bet this is going to have the same underlying
cause as the other bug: something some other thread in the go process is
doing is causing the kernel to ignore the setuid bit. If I add a
time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
is respected every time. It doesn't help that setuid is ignored when
tracing or that strace likes to hang when you trace a_go.
I spent a while staring at the kernel source but I don't really have any
idea how this might be happening. It might be this code
but I don't know how to be sure (well, without building kernels to do
debugging-via-kprint or whatever).
More information about the Snapcraft