[PATCH 3.19.y-ckt 145/146] tcp: fix child sockets to use system default congestion control if not set
Kamal Mostafa
kamal at canonical.com
Wed Jun 17 22:24:10 UTC 2015
3.19.8-ckt2 -stable review patch. If anyone has any objections, please let me know.
------------------
From: Neal Cardwell <ncardwell at google.com>
[ Upstream commit 9f950415e4e28e7cfae2e416b43e862e8101d996 ]
Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.
This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.
In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.
Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal <fw at strlen.de>
Cc: Daniel Borkmann <dborkman at redhat.com>
Cc: Glenn Judd <glenn.judd at morganstanley.com>
Cc: Stephen Hemminger <stephen at networkplumber.org>
Signed-off-by: Neal Cardwell <ncardwell at google.com>
Signed-off-by: Eric Dumazet <edumazet at google.com>
Signed-off-by: Yuchung Cheng <ycheng at google.com>
Acked-by: Daniel Borkmann <daniel at iogearbox.net>
Signed-off-by: David S. Miller <davem at davemloft.net>
Signed-off-by: Kamal Mostafa <kamal at canonical.com>
---
include/net/inet_connection_sock.h | 3 ++-
net/ipv4/tcp_cong.c | 6 ++++--
net/ipv4/tcp_minisocks.c | 3 ++-
3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 848e85c..24d5c09 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -98,7 +98,8 @@ struct inet_connection_sock {
const struct tcp_congestion_ops *icsk_ca_ops;
const struct inet_connection_sock_af_ops *icsk_af_ops;
unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
- __u8 icsk_ca_state;
+ __u8 icsk_ca_state:7,
+ icsk_ca_setsockopt:1;
__u8 icsk_retransmits;
__u8 icsk_pending;
__u8 icsk_backoff;
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index f2d4097..61f329f 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -248,9 +248,10 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
ca = tcp_ca_find(name);
/* no change asking for existing value */
- if (ca == icsk->icsk_ca_ops)
+ if (ca == icsk->icsk_ca_ops) {
+ icsk->icsk_ca_setsockopt = 1;
goto out;
-
+ }
#ifdef CONFIG_MODULES
/* not found attempt to autoload module */
if (!ca && capable(CAP_NET_ADMIN)) {
@@ -273,6 +274,7 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
else {
tcp_cleanup_congestion_control(sk);
icsk->icsk_ca_ops = ca;
+ icsk->icsk_ca_setsockopt = 1;
if (sk->sk_state != TCP_CLOSE && icsk->icsk_ca_ops->init)
icsk->icsk_ca_ops->init(sk);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 71d001d..2f66671 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -451,7 +451,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
newtp->snd_cwnd = TCP_INIT_CWND;
newtp->snd_cwnd_cnt = 0;
- if (!try_module_get(newicsk->icsk_ca_ops->owner))
+ if (!newicsk->icsk_ca_setsockopt ||
+ !try_module_get(newicsk->icsk_ca_ops->owner))
tcp_assign_congestion_control(newsk);
tcp_set_ca_state(newsk, TCP_CA_Open);
--
1.9.1
More information about the kernel-team
mailing list