APPLIED: [SRU][Bionic][PATCH 0/2] Fix nf_conntrack races when dealing with same origin requests in NAT environments

Khaled Elmously khalid.elmously at canonical.com
Tue Jul 23 05:23:37 UTC 2019


On 2019-07-17 12:30:39 , Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836816
> 
> [Impact]
> 
> There are a number of races in nf_conntrack when multiple packets are sent from
> the same origin to the same destination at the same time, where there is no
> pre-existing confirmed conntrack entry, in a NAT environment.
> 
> This is most frequently seen in users with kubernetes workloads, where dns
> requests are sent from the same socket at the same time, from different threads.
> One request is for a A record, the other AAAA. A conntrack race happens, which
> leads to one request being dropped, which leads to 5 second dns timeouts.
> 
> This problem is not specific to kubernetes, as any multi-threaded process
> which sends UDP packets from the same socket at the same time from different
> threads is affected.
> 
> Note that since UDP is connectionless, no packet is sent when connect() is
> called, so no conntrack entries are created until a packet is first sent.
> 
> In the scenario where two UDP packets are sent at the same time from the same
> socket on different threads, there are the following possible races:
> 
> 1.) Neither of the packets find a confirmed conntrack entry. This leads to two
>     conntrack entries being created with the same tuples.
> 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before
>     the other calls get_unique_tuple(). The other packet gets a different reply
>     tuple with the source port changed.
> 
> The outcomes of the races are the same, one of the packets is dropped when it
> comes time to confirm conntrack entries with __nf_conntrack_confirm().
> 
> For more information, read the technical blog post: 
> https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
> 
> [Fix]
> 
> The fix to race 1) was included in 4.19 upstream by this commit:
> 
> netfilter: nf_conntrack: resolve clash for matching conntracks
> commit ed07d9a021df6da53456663a76999189badc432a
> Author: Martynas Pumputis <martynas at weave.works>
> Date: Mon Jul 2 16:52:14 2018 +0200 
> 
> This commit adds an extra check to see if the two entries are the same and to
> merge the entries if they are.
> 
> The fix to race 2) was included in 5.0 upstream by this commit:
> 
> netfilter: nf_nat: skip nat clash resolution for same-origin entries
> commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Author: Martynas Pumputis <martynas at weave.works>
> Date: Tue Jan 29 15:51:42 2019 +0100 
> 
> This commit ensures that a new source port is not allocated to a duplicated
> entry, and forwards the duplicated entries to be resolved later on.
> 
> Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable
> releases 4.9.163, 4.14.106, 4.19.29, 4.20.16.
> 
> Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits
> to all bionic kernels.
> 
> [Testcase]
> 
> The reproducer has been provided by the author of the fixes, and is best viewed
> here:
> 
> https://github.com/brb/conntrack-race
> 
> The dmesg logs and packet traces in the above github repo are best viewed while
> reading upstream discussion: 
> 
> https://patchwork.ozlabs.org/patch/937963/
> 
> I have built a test kernel for xenial HWE, which can be found here:
> https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel
> 
> The test kernel has been tested with a kubernetes workload and aided with
> resolution of kubernetes problems.
> 
> [Regression Potential]
> 
> As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been 
> backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no
> changes to the primary commit. This commit is well tested and considered stable
> by the community.
> 
> Both commits have also been present in the -azure kernels since 4.15.0-1030.31
> and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493
> They have received wide testing, and no problems have been reported.
> 
> The changes are specifically limited to resolving clashes between duplicate
> conntrack entries, and any regressions will be limited to that scenario. The
> overall risk of regression is very low.
> 
> [Notes]
> The commits in the -azure kernels will need to be reverted before importing the
> generic bionic updates, since they are the pre-upstream commits, and have had 
> several revisions since they were imported. Replacing them with the 
> actual upstream commits will make things easier in the future to maintain.
> 
> http://bugs.launchpad.net/bugs/1795493
> 
> For race 1):
> Azure: https://patchwork.ozlabs.org/patch/937963/
> Upstream: ed07d9a021df6da53456663a76999189badc432a
> Commits are the same.
> 
> For race 2):
> Azure: https://patchwork.ozlabs.org/patch/952939/
> Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Note the difference, old commit must be reverted first.
> 
> Martynas Pumputis (2):
>   netfilter: nf_conntrack: resolve clash for matching conntracks
>   netfilter: nf_nat: skip nat clash resolution for same-origin entries
> 
>  net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 8 deletions(-)
> 
> -- 
> 2.20.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list