NACK/Cmnt: [SRU][J][PATCH v1 1/1] tcp: fix forever orphan socket caused by tcp_abort

Fri Jun 20 04:05:18 UTC 2025

On Thu, Jun 19, 2025 at 05:39:14PM +0000, Stav Aviram wrote:
> From d41ec707dbe3fb3fa5b124a31fb9a8fd401fb1b3 Mon Sep 17 00:00:00 2001
> Message-Id: <d41ec707dbe3fb3fa5b124a31fb9a8fd401fb1b3.1750244904.git.saviram at nvidia.com>
> In-Reply-To: <cover.1750244904.git.saviram at nvidia.com>
> References: <cover.1750244904.git.saviram at nvidia.com>
> From: Xueming Feng <kuro at kuroa.me>
> Date: Mon, 26 Aug 2024 18:23:27 +0800
> To: kernel-team at lists.ubuntu.com
> Subject: [SRU][J][PATCH v1 1/1] tcp: fix forever orphan socket caused by tcp_abort
> 
> BugLink: https://bugs.launchpad.net/bugs/2114965
> 
> We have some problem closing zero-window fin-wait-1 tcp sockets in our
> environment. This patch come from the investigation.
> 
> Previously tcp_abort only sends out reset and calls tcp_done when the
> socket is not SOCK_DEAD, aka orphan. For orphan socket, it will only
> purging the write queue, but not close the socket and left it to the
> timer.
> 
> While purging the write queue, tp->packets_out and sk->sk_write_queue
> is cleared along the way. However tcp_retransmit_timer have early
> return based on !tp->packets_out and tcp_probe_timer have early
> return based on !sk->sk_write_queue.
> 
> This caused ICSK_TIME_RETRANS and ICSK_TIME_PROBE0 not being resched
> and socket not being killed by the timers, converting a zero-windowed
> orphan into a forever orphan.
> 
> This patch removes the SOCK_DEAD check in tcp_abort, making it send
> reset to peer and close the socket accordingly. Preventing the
> timer-less orphan from happening.
> 
> According to Lorenzo's email in the v1 thread, the check was there to
> prevent force-closing the same socket twice. That situation is handled
> by testing for TCP_CLOSE inside lock, and returning -ENOENT if it is
> already closed.
> 
> The -ENOENT code comes from the associate patch Lorenzo made for
> iproute2-ss; link attached below, which also conform to RFC 9293.
> 
> At the end of the patch, tcp_write_queue_purge(sk) is removed because it
> was already called in tcp_done_with_error().
> 
> p.s. This is the same patch with v2. Resent due to mis-labeled "changes
> requested" on patchwork.kernel.org.
> 
> Conflict Resolution:
> The conflict arose due to differences in error handling and logging
> around tcp_send_active_reset(). The if (!sock_flag(sk, SOCK_DEAD)) check
> was removed as in the upstream, while preserving the surrounding logic
> from HEAD. The upstream removal of tcp_write_queue_purge(sk) was not
> applied, since tcp_done_with_error(), which makes it redundant, is not
> used in HEAD. Lastly, !has_current_bpf_ctx() was replaced with
> !current->bpf_ctx for compatibility, as the helper is unavailable in
> this kernel version.
> 
> Link: https://patchwork.ozlabs.org/project/netdev/patch/1450773094-7978-3-git-send-email-lorenzo@google.com/
> Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
> Signed-off-by: Xueming Feng <kuro at kuroa.me>
> Tested-by: Lorenzo Colitti <lorenzo at google.com>
> Reviewed-by: Jason Xing <kerneljasonxing at gmail.com>
> Reviewed-by: Eric Dumazet <edumazet at google.com>
> Link: https://patch.msgid.link/20240826102327.1461482-1-kuro@kuroa.me
> Signed-off-by: Jakub Kicinski <kuba at kernel.org>
> (backported from commit bac76cf89816bff06c4ec2f3df97dc34e150a1c4)
> Signed-off-by: Stav Aviram <saviram at nvidia.com>
> Change-Id: I8fabcb0e781f4e08d7b114f956321955df46af36
> ---
>  net/ipv4/tcp.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3c85ecab1445..ddb3331577d9 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4504,6 +4504,13 @@ int tcp_abort(struct sock *sk, int err)
>       /* Don't race with userspace socket closes such as tcp_close. */
>       lock_sock(sk);
> 
> +     /* Avoid closing the same socket twice. */
> +     if (sk->sk_state == TCP_CLOSE) {
> +           if (!current->bpf_ctx)
> +                 release_sock(sk);
> +           return -ENOENT;
> +     }
> +
>       if (sk->sk_state == TCP_LISTEN) {
>             tcp_set_state(sk, TCP_CLOSE);
>             inet_csk_listen_stop(sk);
> @@ -4513,15 +4520,13 @@ int tcp_abort(struct sock *sk, int err)
>       local_bh_disable();
>       bh_lock_sock(sk);
> 
> -     if (!sock_flag(sk, SOCK_DEAD)) {
> -           WRITE_ONCE(sk->sk_err, err);
> -           /* This barrier is coupled with smp_rmb() in tcp_poll() */
> -           smp_wmb();
> -           sk_error_report(sk);
> -           if (tcp_need_reset(sk->sk_state))
> -                 tcp_send_active_reset(sk, GFP_ATOMIC);
> -           tcp_done(sk);
> -     }
> +     WRITE_ONCE(sk->sk_err, err);
> +     /* This barrier is coupled with smp_rmb() in tcp_poll() */
> +     smp_wmb();
> +     sk_error_report(sk);
> +     if (tcp_need_reset(sk->sk_state))
> +           tcp_send_active_reset(sk, GFP_ATOMIC);
> +     tcp_done(sk);
You can't use this way to submit patch, I can't apply it directly.

The code snip is quite different to the current Jammy kernel code.
```
        if (!sock_flag(sk, SOCK_DEAD)) {
                if (tcp_need_reset(sk->sk_state))
                        tcp_send_active_reset(sk, GFP_ATOMIC);
                tcp_done_with_error(sk, err);
        }
```

Please make sure the patch could be applied cleanly on top of the
master-next branch.
Thanks.

> 
>       bh_unlock_sock(sk);
>       local_bh_enable();
> --
> 2.34.1
> 

> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team