[Bug 997217] Re: salsauthd maxes cpu
Andreas Hasenack
andreas at canonical.com
Wed Oct 24 20:55:17 UTC 2018
** Description changed:
[Impact]
The rimap authentication mechanism in saslauthd can hit a condition
where it will start spinning and using all available CPU. This condition
can be easily encountered when an authentication is happening and the
imap service is being restarted.
Furthermore, the saslauthd child process that picked up that
authentication request and that is spinning now won't be reaped nor can
it service further requests. If all children are left in this state, the
authentication service as a whole won't be working anymore.
[Test Case]
This test can be performed in a LXD or VM.
* install the needed packages. mail-stack-delivery is used to have an
imap server available on localhost that needs no further configuration.
Accept the defaults for all debconf prompts:
sudo apt update
sudo apt install sasl2-bin mail-stack-delivery
* set the password "ubuntu" for the ubuntu user
echo ubuntu:ubuntu | sudo chpasswd
* start saslauthd like this, with just one child:
sudo /usr/sbin/saslauthd -a rimap -O localhost -r -n 1
* restart dovecot
sudo service dovecot restart
* test saslauthd authentication:
$ sudo testsaslauthd -u ubuntu -p ubuntu
0: OK "Success."
* Now let's break it. In one terminal watch the output of top:
top
* in another terminal, run the following:
sudo testsaslauthd -u ubuntu -p ubuntu & sleep 1; sudo service dovecot stop
* observe in the "top" terminal that saslauthd is consuming a lot of
cpu. If that's not happening, try starting dovecot again and adjusting
- the sleep value in the previous test command.
+ the sleep value in the previous test command, but 1s was enough in all
+ my runs.
- * repeat the authentication request. Since the only saslauthd child is now spinning, this will block:
+ * start dovecot and repeat the authentication request. Since the only saslauthd child is now spinning, this will block:
+ sudo service dovecot start
$ sudo testsaslauthd -u ubuntu -p ubuntu
<blocks>
-
[Regression Potential]
* discussion of how regressions are most likely to manifest as a result
of this change.
* It is assumed that any SRU candidate patch is well-tested before
upload and has a low overall risk of regression, but it's important
to make the effort to think about what ''could'' happen in the
event of a regression.
* This both shows the SRU team that the risks have been considered,
and provides guidance to testers in regression-testing the SRU.
[Other Info]
* Anything else you think is useful to include
* Anticipate questions from users, SRU, +1 maintenance, security teams and the Technical Board
* and address these questions in advance
[Original Description]
sasl2-bin version 2.1.24~rc1.dfsg1+cvs2011-05-23-4ubuntu contains a bug
that causes heavy cpu utilization, impacting normal operation of one of
our mail servers following an upgrade to Ubuntu 12.04.
We are running the daemon with the following options:
/usr/sbin/saslauthd -a rimap -O our.imap.server -r -m
/var/spool/postfix/var/run/saslauthd -n 5
We noticed that users were unable to send mail and that the saslauthd
processes were using approximately 100% of each cpu core. An strace of
one of the runaway process showed that it was stuck in the following
behaviour:
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
read(8, "", 940) = 0
.....
with further inspection showing that the file descriptor in question was
a socket connected to our imap server in CLOSE_WAIT.
Browsing saslauthd/auth_rimap.c in the source package for sasl2-bin, we
came across the following code, repeated in two locations:
while( select (fds, &perm, NULL, NULL, &timeout ) >0 ) {
if ( FD_ISSET(s, &perm) ) {
ret = read(s, rbuf+rc, sizeof(rbuf)-rc);
if ( ret<0 ) {
rc = ret;
break;
} else {
rc += ret;
}
}
}
It looks like this loop is expected to run until a read error is
encountered or the timeout of 1 second is reached. There is no test to
check that 0 bytes were read, indicating that the connection was closed
by the remote peer. Since select() will immediately return the size of
the set of the partially closed descriptor (1, which is >0), and calls
to read() will always yield 0 bytes, there's the potential for execution
to get stuck in this non blocking loop and I'm presuming that that's
what's happening here.
We've not performed any further analysis to prove that this is really
what's happening but if my intuition is correct then our IMAP server (an
nginx imap proxy) most liklely closes the connection at an unexpected
time under as yet undetermined conditions.
--
You received this bug notification because you are a member of Ubuntu
Server, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/997217
Title:
salsauthd maxes cpu
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cyrus-sasl2/+bug/997217/+subscriptions
More information about the Ubuntu-server-bugs
mailing list