NACK: [PATCH 1/1] x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl
Jeffrey Lane
jeffrey.lane at canonical.com
Fri Oct 21 16:22:24 UTC 2022
So just to chase this a bit so I can relay back to Intel, have you had
a chance to consider this request from them @Stephane Graber or
@Michael Vogt ??
On Wed, Oct 5, 2022 at 4:26 PM Thadeu Lima de Souza Cascardo
<cascardo at canonical.com> wrote:
>
> I was, at first, resistant to this change until someone did the work of making
> sure userspace on focal and jammy did the right thing of opting in or out via
> prctl. The list below is not exhaustive and I don't expect it to be. But it is
> certainly missing snapd and LXD. I would rather have the input from someone
> responsible for those than rush this into our LTS kernels.
>
> I am aware this is already the default on Kinetic, but perhaps this is good
> timing to understand the implications for these two cases.
>
> Cascardo.
>
> On Wed, Oct 05, 2022 at 03:34:20PM -0400, Jeff Lane wrote:
> > From: Andrea Arcangeli <aarcange at redhat.com>
> >
> > BugLink: http://bugs.launchpad.net/bug/1980160
> >
> > Switch the kernel default of SSBD and STIBP to the ones with
> > CONFIG_SECCOMP=n (i.e. spec_store_bypass_disable=prctl
> > spectre_v2_user=prctl) even if CONFIG_SECCOMP=y.
> >
> > Several motivations listed below:
> >
> > - If SMT is enabled the seccomp jail can still attack the rest of the
> > system even with spectre_v2_user=seccomp by using MDS-HT (except on
> > XEON PHI where MDS can be tamed with SMT left enabled, but that's a
> > special case). Setting STIBP become a very expensive window dressing
> > after MDS-HT was discovered.
> >
> > - The seccomp jail cannot attack the kernel with spectre-v2-HT
> > regardless (even if STIBP is not set), but with MDS-HT the seccomp
> > jail can attack the kernel too.
> >
> > - With spec_store_bypass_disable=prctl the seccomp jail can attack the
> > other userland (guest or host mode) using spectre-v2-HT, but the
> > userland attack is already mitigated by both ASLR and pid namespaces
> > for host userland and through virt isolation with libkrun or
> > kata. (if something if somebody is worried about spectre-v2-HT it's
> > best to mount proc with hidepid=2,gid=proc on workstations where not
> > all apps may run under container runtimes, rather than slowing down
> > all seccomp jails, but the best is to add pid namespaces to the
> > seccomp jail). As opposed MDS-HT is not mitigated and the seccomp
> > jail can still attack all other host and guest userland if SMT is
> > enabled even with spec_store_bypass_disable=seccomp.
> >
> > - If full security is required then MDS-HT must also be mitigated with
> > nosmt and then spectre_v2_user=prctl and spectre_v2_user=seccomp
> > would become identical.
> >
> > - Setting spectre_v2_user=seccomp is overall lower priority than to
> > setting javascript.options.wasm false in about:config to protect
> > against remote wasm MDS-HT, instead of worrying about Spectre-v2-HT
> > and STIBP which again is already statistically well mitigated by
> > other means in userland and it's fully mitigated in kernel with
> > retpolines (unlike the wasm assist call with MDS-HT).
> >
> > - SSBD is needed to prevent reading the JIT memory and the primary
> > user being the OpenJDK. However the primary user of SSBD wouldn't be
> > covered by spec_store_bypass_disable=seccomp because it doesn't use
> > seccomp and the primary user also explicitly declined to set
> > PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS despite it easily
> > could. In fact it would need to set it only when the sandboxing
> > mechanism is enabled for javaws applets, but it still declined it by
> > declaring security within the same user address space as an
> > untenable objective for their JIT, even in the sandboxing case where
> > performance would be a lesser concern (for the record: I kind of
> > disagree in not setting PR_SPEC_STORE_BYPASS in the sandbox case and
> > I prefer to run javaws through a wrapper that sets
> > PR_SPEC_STORE_BYPASS if I need). In turn it can be inferred that
> > even if the primary user of SSBD would use seccomp, they would
> > invoke it with SECCOMP_FILTER_FLAG_SPEC_ALLOW by now.
> >
> > - runc/crun already set SECCOMP_FILTER_FLAG_SPEC_ALLOW by default, k8s
> > and podman have a default json seccomp allowlist that cannot be
> > slowed down, so for the #1 seccomp user this change is already a
> > noop.
> >
> > - systemd/sshd or other apps that use seccomp, if they really need
> > STIBP or SSBD, they need to explicitly set the
> > PR_SET_SPECULATION_CTRL by now. The stibp/ssbd seccomp blind
> > catch-all approach was done probably initially with a wishful
> > thinking objective to pretend to have a peace of mind that it could
> > magically fix it all. That was wishful thinking before MDS-HT was
> > discovered, but after MDS-HT has been discovered it become just
> > window dressing.
> >
> > - For qemu "-sandbox" seccomp jail it wouldn't make sense to set STIBP
> > or SSBD. SSBD doesn't help with KVM because there's no JIT (if it's
> > needed with TCG it should be an opt-in with
> > PR_SET_SPECULATION_CTRL+PR_SPEC_STORE_BYPASS and it shouldn't
> > slowdown KVM for nothing). For qemu+KVM STIBP would be even more
> > window dressing than it is for all other apps, because in the
> > qemu+KVM case there's not only the MDS attack to worry about with
> > SMT enabled. Even after disabling SMT, there's still a theoretical
> > spectre-v2 attack possible within the same thread context from guest
> > mode to host ring3 that the host kernel retpoline mitigation has no
> > theoretical chance to mitigate. On some kernels a
> > ibrs-always/ibrs-retpoline opt-in model is provided that will
> > enabled IBRS in the qemu host ring3 userland which fixes this
> > theoretical concern. Only after enabling IBRS in the host userland
> > it would then make sense to proceed and worry about STIBP and an
> > attack on the other host userland, but then again SMT would need to
> > be disabled for full security anyway, so that would render STIBP
> > again a noop.
> >
> > - last but not the least: the lack of "spec_store_bypass_disable=prctl
> > spectre_v2_user=prctl" means the moment a guest boots and
> > sshd/systemd runs, the guest kernel will write to SPEC_CTRL MSR
> > which will make the guest vmexit forever slower, forcing KVM to
> > issue a very slow rdmsr instruction at every vmexit. So the end
> > result is that SPEC_CTRL MSR is only available in GCE. Most other
> > public cloud providers don't expose SPEC_CTRL, which means that not
> > only STIBP/SSBD isn't available, but IBPB isn't available either
> > (which would cause no overhead to the guest or the hypervisor
> > because it's write only and requires no reading during vmexit). So
> > the current default already net loss in security (missing IBPB)
> > which means most public cloud providers cannot achieve a fully
> > secure guest with nosmt (and nosmt is enough to fully mitigate
> > MDS-HT). It also means GCE and is unfairly penalized in performance
> > because it provides the option to enable full security in the guest
> > as an opt-in (i.e. nosmt and IBPB). So this change will allow all
> > cloud providers to expose SPEC_CTRL without incurring into any
> > hypervisor slowdown and at the same time it will remove the unfair
> > penalization of GCE performance for doing the right thing and it'll
> > allow to get full security with nosmt with IBPB being available (and
> > STIBP becoming meaningless).
> >
> > Example to put things in prospective: the STIBP enabled in seccomp has
> > never been about protecting apps using seccomp like sshd from an
> > attack from a malicious userland, but to the contrary it has always
> > been about protecting the system from an attack from sshd, after a
> > successful remote network exploit against sshd. In fact initially it
> > wasn't obvious STIBP would work both ways (STIBP was about preventing
> > the task that runs with STIBP to be attacked with spectre-v2-HT, but
> > accidentally in the STIBP case it also prevents the attack in the
> > other direction). In the hypothetical case that sshd has been remotely
> > exploited the last concern should be STIBP being set, because it'll be
> > still possible to obtain info even from the kernel by using MDS if
> > nosmt wasn't set (and if it was set, STIBP is a noop in the first
> > place). As opposed kernel cannot leak anything with spectre-v2 HT
> > because of retpolines and the userland is mitigated by ASLR already
> > and ideally PID namespaces too. If something it'd be worth checking if
> > sshd run the seccomp thread under pid namespaces too if available in
> > the running kernel. SSBD also would be a noop for sshd, since sshd
> > uses no JIT. If sshd prefers to keep doing the STIBP window dressing
> > exercise, it still can even after this change of defaults by opting-in
> > with PR_SPEC_INDIRECT_BRANCH.
> >
> > Ultimately setting SSBD and STIBP by default for all seccomp jails is
> > a bad sweet spot and bad default with more cons than pros that end up
> > reducing security in the public cloud (by giving an huge incentive to
> > not expose SPEC_CTRL which would be needed to get full security with
> > IBPB after setting nosmt in the guest) and by excessively hurting
> > performance to more secure apps using seccomp that end up having to
> > opt out with SECCOMP_FILTER_FLAG_SPEC_ALLOW.
> >
> > The following is the verified result of the new default with SMT
> > enabled:
> >
> > (gdb) print spectre_v2_user_stibp
> > $1 = SPECTRE_V2_USER_PRCTL
> > (gdb) print spectre_v2_user_ibpb
> > $2 = SPECTRE_V2_USER_PRCTL
> > (gdb) print ssb_mode
> > $3 = SPEC_STORE_BYPASS_PRCTL
> >
> > Signed-off-by: Andrea Arcangeli <aarcange at redhat.com>
> > Signed-off-by: Kees Cook <keescook at chromium.org>
> > Link: https://lore.kernel.org/r/20201104235054.5678-1-aarcange@redhat.com
> > Acked-by: Josh Poimboeuf <jpoimboe at redhat.com>
> > Link: https://lore.kernel.org/lkml/AAA2EF2C-293D-4D5B-BFA6-FF655105CD84@redhat.com
> > Acked-by: Waiman Long <longman at redhat.com>
> > Link: https://lore.kernel.org/lkml/c0722838-06f7-da6b-138f-e0f26362f16a@redhat.com
> > (cherry picked from commit 2f46993d83ff4abb310ef7b4beced56ba96f0d9d)
> > Signed-off-by: Jeff Lane <jeffrey.lane at canonical.com>
> > ---
> > Documentation/admin-guide/hw-vuln/spectre.rst | 10 ++++------
> > Documentation/admin-guide/kernel-parameters.txt | 5 ++---
> > arch/x86/kernel/cpu/bugs.c | 4 ++--
> > 3 files changed, 8 insertions(+), 11 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
> > index 6bd97cd50d62..7c3f5152ae0c 100644
> > --- a/Documentation/admin-guide/hw-vuln/spectre.rst
> > +++ b/Documentation/admin-guide/hw-vuln/spectre.rst
> > @@ -505,9 +505,8 @@ Spectre variant 2
> >
> > Restricting indirect branch speculation on a user program will
> > also prevent the program from launching a variant 2 attack
> > - on x86. All sand-boxed SECCOMP programs have indirect branch
> > - speculation restricted by default. Administrators can change
> > - that behavior via the kernel command line and sysfs control files.
> > + on x86. Administrators can change that behavior via the kernel
> > + command line and sysfs control files.
> > See :ref:`spectre_mitigation_control_command_line`.
> >
> > Programs that disable their indirect branch speculation will have
> > @@ -690,9 +689,8 @@ Mitigation selection guide
> > off by disabling their indirect branch speculation when they are run
> > (See :ref:`Documentation/userspace-api/spec_ctrl.rst <set_spec_ctrl>`).
> > This prevents untrusted programs from polluting the branch target
> > - buffer. All programs running in SECCOMP sandboxes have indirect
> > - branch speculation restricted by default. This behavior can be
> > - changed via the kernel command line and sysfs control files. See
> > + buffer. This behavior can be changed via the kernel command line
> > + and sysfs control files. See
> > :ref:`spectre_mitigation_control_command_line`.
> >
> > 3. High security mode
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index efb9e8b66652..51138f256eb1 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -5398,8 +5398,7 @@
> > auto - Kernel selects the mitigation depending on
> > the available CPU features and vulnerability.
> >
> > - Default mitigation:
> > - If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
> > + Default mitigation: "prctl"
> >
> > Not specifying this option is equivalent to
> > spectre_v2_user=auto.
> > @@ -5443,7 +5442,7 @@
> > will disable SSB unless they explicitly opt out.
> >
> > Default mitigations:
> > - X86: If CONFIG_SECCOMP=y "seccomp", otherwise "prctl"
> > + X86: "prctl"
> >
> > On powerpc the options are:
> >
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index fc22f6d646e0..fe3bb03c07b7 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1113,11 +1113,11 @@ spectre_v2_user_select_mitigation(void)
> > case SPECTRE_V2_USER_CMD_FORCE:
> > mode = SPECTRE_V2_USER_STRICT;
> > break;
> > + case SPECTRE_V2_USER_CMD_AUTO:
> > case SPECTRE_V2_USER_CMD_PRCTL:
> > case SPECTRE_V2_USER_CMD_PRCTL_IBPB:
> > mode = SPECTRE_V2_USER_PRCTL;
> > break;
> > - case SPECTRE_V2_USER_CMD_AUTO:
> > case SPECTRE_V2_USER_CMD_SECCOMP:
> > case SPECTRE_V2_USER_CMD_SECCOMP_IBPB:
> > if (IS_ENABLED(CONFIG_SECCOMP))
> > @@ -1716,7 +1716,6 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
> > return mode;
> >
> > switch (cmd) {
> > - case SPEC_STORE_BYPASS_CMD_AUTO:
> > case SPEC_STORE_BYPASS_CMD_SECCOMP:
> > /*
> > * Choose prctl+seccomp as the default mode if seccomp is
> > @@ -1730,6 +1729,7 @@ static enum ssb_mitigation __init __ssb_select_mitigation(void)
> > case SPEC_STORE_BYPASS_CMD_ON:
> > mode = SPEC_STORE_BYPASS_DISABLE;
> > break;
> > + case SPEC_STORE_BYPASS_CMD_AUTO:
> > case SPEC_STORE_BYPASS_CMD_PRCTL:
> > mode = SPEC_STORE_BYPASS_PRCTL;
> > break;
> > --
> > 2.34.1
> >
> >
> > --
> > kernel-team mailing list
> > kernel-team at lists.ubuntu.com
> > https://lists.ubuntu.com/mailman/listinfo/kernel-team
--
Jeff Lane
Engineering Manager
IHV/OEM Alliances and Server Certification
"Entropy isn't what it used to be."
More information about the kernel-team
mailing list