[PATCH][ARTFUL]UBUNTU SAUCE: exec: ensure file system accounting in check_unsafe_exec is correct

Colin King colin.king at canonical.com
Fri May 12 14:49:53 UTC 2017

From: Colin Ian King <colin.king at canonical.com>

BugLink: http://bugs.launchpad.net/bugs/1672819

There are two very race windows that need to be taken into consideration
when check_unsafe_exec  performs the file system accounting comparing the
number of fs->users against the number threads that share the same fs.

The first race can occur when a pthread creates a new pthread and the
the fs->users count is incremented before the new pthread is associated with
the pthread performing the exec. When this occurs, the pthread performing
the exec has flags with bit PF_FORKNOEXEC set.

The second race can occur when a pthread is terminating and the fs->users
count has been decremented by the pthread is still associated with the
pthread that is performing the exec. When this occurs, the pthread
peforming the exec has flags with bit PF_EXITING set.

This fix keeps track of any pthreads that may be in the race window
(when PF_FORKNOEXEC or PF_EXITING) are set and if the fs count does
not match the expected count we retry the count as we may have hit
this small race windows.  Tests on an 8 thread server with the
reproducer (see below) show that this retry occurs rarely, so the
overhead of the retry is very small.

Below is a reproducer of the race condition.

The bug manifests itself because the current check_unsafe_exec
hits this race and indicates it is not a safe exec, and the
exec'd suid program fails to setuid.

$ cat Makefile
ALL=a b
all: $(ALL)

a: LDFLAGS=-pthread

b: b.c
	$(CC) b.c -o b
	sudo chown root:root b
	sudo chmod u+s b

	for I in $$(seq 1000); do echo $I; ./a ; done

	rm -vf $(ALL)

$ cat a.c

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/types.h>

void *nothing(void *p)
	return NULL;

void *target(void *p) {
	for (;;) {
		pthread_t t;
		if (pthread_create(&t, NULL, nothing, NULL) == 0)
			pthread_join(t, NULL);
	return NULL;

int main(void)
	struct timespec tv;
	int i;

	for (i = 0; i < 10; i++) {
		pthread_t t;
		pthread_create(&t, NULL, target, NULL);
	tv.tv_sec = 0;
	tv.tv_nsec = 100000;
	nanosleep(&tv, NULL);
	if (execl("./b", "./b", NULL) < 0)
	return 0;

$ cat b.c

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>

int main(void)
	const uid_t euid = geteuid();
	if (euid != 0) {
		printf("Failed, got euid %d (expecting 0)\n", euid);
        	return 1;
	return 0;

$ make
cc   -pthread  a.c   -o a
cc b.c -o b
sudo chown root:root b
sudo chmod u+s b
$ for i in $(seq 1000); do ./a; done

Without the fix, one will see 'Failed, got euid 1000 (expecting 0)' messages

Signed-off-by: Colin Ian King <colin.king at canonical.com>
 fs/exec.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index bfbe8fb55611..2dc60b53afc7 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1442,6 +1442,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 	struct task_struct *p = current, *t;
 	unsigned n_fs;
+	bool fs_recheck;
 	if (p->ptrace)
 		bprm->unsafe |= LSM_UNSAFE_PTRACE;
@@ -1453,6 +1454,8 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 	if (task_no_new_privs(current))
 		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
+	fs_recheck = false;
 	t = p;
 	n_fs = 1;
@@ -1460,12 +1463,18 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 	while_each_thread(p, t) {
 		if (t->fs == p->fs)
+		if (t->flags & (PF_EXITING | PF_FORKNOEXEC))
+			fs_recheck  = true;
-	if (p->fs->users > n_fs)
+	if (p->fs->users > n_fs) {
+		if (fs_recheck) {
+			spin_unlock(&p->fs->lock);
+			goto recheck;
+		}
 		bprm->unsafe |= LSM_UNSAFE_SHARE;
-	else
+	} else
 		p->fs->in_exec = 1;

More information about the kernel-team mailing list