APPLIED[D]: [SRU B/C] [PATCH 0/4] Line discipline buffer flush/tty_reopen() race fix

Wed Jan 9 17:17:28 UTC 2019

On Tue, Jan 08, 2019 at 06:28:07PM -0200, Guilherme G. Piccoli wrote:
> BugLink: https://bugs.launchpad.net/bugs/1791758
> 
> ** NOTICE: Patch 1 in this SRU request should be applied only to 4.15,
> since it is already present in 4.18.
> 
> 
> [Impact]
> 
> * Line discipline code is racy when we have buffer being flush while the
> tty is being initialized or reinitialized. For the first problem, we have
> an upstream patch since January 2018:
> b027e2298bd5 ("tty: fix data race between tty_init_dev and flush of buf");
> although it is not in Ubuntu kernel 4.4, only in kernels 4.15 and subsequent
> ones.
> 
> * For the race between the buffer flush while tty is being reopened, we
> have a patch that addresses this issue recently merged for 5.0-rc1:
> 83d817f41070 ("tty: Hold tty_ldisc_lock() during tty_reopen()").
> No Ubuntu kernel currently contains this patch, hence we're hereby
> submitting the SRU request. The upstream complete patch series for
> this is in [0].
> 
> * The approach of both patches are similar - they rely in locking/semaphore
> to prevent race conditions. Some additional patches are necessary to prevent
> correlated issues, like preventing a potential deadlock due to bad
> prioritization in servicing I/O over releasing tty_ldisc_lock() - refer to
> c96cf923a98d ("tty: Don't block on IO when ldisc change is pending").
> All the necessary fixes are grouped here in this SRU request.
> 
> * The symptom of the race condition between the buffer flush and the
> tty reopen routine is a kernel crash with the following trace:
> 
> 
> BUG: unable to handle kernel paging request at 0000000000002268
> IP: [<addr>] n_tty_receive_buf_common+0x6a/0xae0
> [...]
> Call Trace:
> [<addr>] ? kvm_sched_clock_read+0x1e/0x30
> [<addr>] n_tty_receive_buf2+0x14/0x20
> [<addr>] flush_to_ldisc+0xd5/0x120
> [<addr>] process_one_work+0x156/0x400
> [<addr>] worker_thread+0x11a/0x480
> [...]
> 
> 
> * A kernel crash was collected from an user, analysis is present in
> comment #4 in LP #1791758.
> 
> 
> [Test Case]
> 
> * It is not trivial to trigger this fault, but the usual recipe is to keep
> accessing a machine through SSH (or keep killing getty when in IPMI serial
> console) and in some way run commands before the terminal is ready in that
> machine (like hacking some echo into ttySx or pts in an infinite loop).
> 
> * We have reports of users that could reproduce this issue in their
> production environment, and with the patches present in this SRU request
> the problem was fixed.
> 
> 
> [Regression Potential]
> 
> * tty subsystem is highly central and patches in that area are always
> delicate. For example, the upstream series [0] is a re-spin (V6) due to
> a hard to reproduce issue reported in the PA-RISC architecture, which was
> found in the V5 iteration [1] but was fixed by the patch c96cf923a98d,
> present in this SRU request.
> 
> * The patchset [0] is present in tty-next tree since mid-November, so
> the overall likelihood of regressions is low.
> 
> * These patches were sniff-tested for the 3 versions (4.4, 4.15 and 4.18)
> and didn't show any issues.
> 
> 
> [0] https://marc.info/?l=linux-kernel&m=154103190111795
> [1] https://marc.info/?l=linux-kernel&m=153737852618183

Please remember to include the development kernel when it does not yet
have the patches. Applied to disco/master-next, thanks!