ACK: [SRU][Bionic][PATCH 0/2] Fix nf_conntrack races when dealing with same origin requests in NAT environments
Connor Kuehl
connor.kuehl at canonical.com
Fri Jul 19 15:54:58 UTC 2019
On 7/16/19 5:30 PM, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836816
>
> [Impact]
>
> There are a number of races in nf_conntrack when multiple packets are sent from
> the same origin to the same destination at the same time, where there is no
> pre-existing confirmed conntrack entry, in a NAT environment.
>
> This is most frequently seen in users with kubernetes workloads, where dns
> requests are sent from the same socket at the same time, from different threads.
> One request is for a A record, the other AAAA. A conntrack race happens, which
> leads to one request being dropped, which leads to 5 second dns timeouts.
>
> This problem is not specific to kubernetes, as any multi-threaded process
> which sends UDP packets from the same socket at the same time from different
> threads is affected.
>
> Note that since UDP is connectionless, no packet is sent when connect() is
> called, so no conntrack entries are created until a packet is first sent.
>
> In the scenario where two UDP packets are sent at the same time from the same
> socket on different threads, there are the following possible races:
>
> 1.) Neither of the packets find a confirmed conntrack entry. This leads to two
> conntrack entries being created with the same tuples.
> 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before
> the other calls get_unique_tuple(). The other packet gets a different reply
> tuple with the source port changed.
>
> The outcomes of the races are the same, one of the packets is dropped when it
> comes time to confirm conntrack entries with __nf_conntrack_confirm().
>
> For more information, read the technical blog post:
> https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
>
> [Fix]
>
> The fix to race 1) was included in 4.19 upstream by this commit:
>
> netfilter: nf_conntrack: resolve clash for matching conntracks
> commit ed07d9a021df6da53456663a76999189badc432a
> Author: Martynas Pumputis <martynas at weave.works>
> Date: Mon Jul 2 16:52:14 2018 +0200
>
> This commit adds an extra check to see if the two entries are the same and to
> merge the entries if they are.
>
> The fix to race 2) was included in 5.0 upstream by this commit:
>
> netfilter: nf_nat: skip nat clash resolution for same-origin entries
> commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Author: Martynas Pumputis <martynas at weave.works>
> Date: Tue Jan 29 15:51:42 2019 +0100
>
> This commit ensures that a new source port is not allocated to a duplicated
> entry, and forwards the duplicated entries to be resolved later on.
>
> Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable
> releases 4.9.163, 4.14.106, 4.19.29, 4.20.16.
>
> Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits
> to all bionic kernels.
>
> [Testcase]
>
> The reproducer has been provided by the author of the fixes, and is best viewed
> here:
>
> https://github.com/brb/conntrack-race
>
> The dmesg logs and packet traces in the above github repo are best viewed while
> reading upstream discussion:
>
> https://patchwork.ozlabs.org/patch/937963/
>
> I have built a test kernel for xenial HWE, which can be found here:
> https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel
>
> The test kernel has been tested with a kubernetes workload and aided with
> resolution of kubernetes problems.
>
> [Regression Potential]
>
> As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been
> backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no
> changes to the primary commit. This commit is well tested and considered stable
> by the community.
>
> Both commits have also been present in the -azure kernels since 4.15.0-1030.31
> and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493
> They have received wide testing, and no problems have been reported.
>
> The changes are specifically limited to resolving clashes between duplicate
> conntrack entries, and any regressions will be limited to that scenario. The
> overall risk of regression is very low.
>
> [Notes]
> The commits in the -azure kernels will need to be reverted before importing the
> generic bionic updates, since they are the pre-upstream commits, and have had
> several revisions since they were imported. Replacing them with the
> actual upstream commits will make things easier in the future to maintain.
>
> http://bugs.launchpad.net/bugs/1795493
>
> For race 1):
> Azure: https://patchwork.ozlabs.org/patch/937963/
> Upstream: ed07d9a021df6da53456663a76999189badc432a
> Commits are the same.
>
> For race 2):
> Azure: https://patchwork.ozlabs.org/patch/952939/
> Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Note the difference, old commit must be reverted first.
>
> Martynas Pumputis (2):
> netfilter: nf_conntrack: resolve clash for matching conntracks
> netfilter: nf_nat: skip nat clash resolution for same-origin entries
>
> net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------
> 1 file changed, 38 insertions(+), 8 deletions(-)
>
Acked-by: Connor Kuehl <connor.kuehl at canonical.com>
More information about the kernel-team
mailing list