[LSN-0108-1] Linux kernel vulnerability
benjamin.romer at canonical.com
benjamin.romer at canonical.com
Thu Dec 19 16:26:08 UTC 2024
Linux kernel vulnerabilities
A security issue affects these releases of Ubuntu and its derivatives:
- Ubuntu 20.04 LTS
- Ubuntu 18.04 LTS
- Ubuntu 16.04 LTS
- Ubuntu 22.04 LTS
- Ubuntu 14.04 LTS
Summary
Several security issues were fixed in the kernel.
Software Description
- linux - Linux kernel
- linux-aws - Linux kernel for Amazon Web Services (AWS) systems
- linux-azure - Linux kernel for Microsoft Azure Cloud systems
- linux-gcp - Linux kernel for Google Cloud Platform (GCP) systems
- linux-gke - Linux kernel for Google Container Engine (GKE) systems
- linux-gkeop - Linux kernel for Google Container Engine (GKE) systems
- linux-ibm - Linux kernel for IBM cloud systems
- linux-oracle - Linux kernel for Oracle Cloud systems
Details
In the Linux kernel, the following vulnerability has been resolved: tls:
fix use-after-free on failed backlog decryption When the decrypt request
goes to the backlog and crypto_aead_decrypt returns -EBUSY,
tls_do_decryption will wait until all async decryptions have completed.
If one of them fails, tls_do_decryption will return -EBADMSG and
tls_decrypt_sg jumps to the error path, releasing all the pages. But the
pages have been passed to the async callback, and have already been
released by tls_decrypt_done. The only true async case is when
crypto_aead_decrypt returns -EINPROGRESS. With -EBUSY, we already waited
so we can tell tls_sw_recvmsg that the data is available for immediate
copy, but we need to notify tls_decrypt_sg (via the new ->async_done
flag) that the memory has already been released. (CVE-2024-26800)
In the Linux kernel, the following vulnerability has been resolved:
inet: inet_defrag: prevent sk release while still in use ip_local_out()
and other functions can pass skb->sk as function argument. If the skb is
a fragment and reassembly happens before such function call returns, the
sk must not be released. This affects skb fragments reassembled via
netfilter or similar modules, e.g. openvswitch or ct_act.c, when run as
part of tx pipeline. Eric Dumazet made an initial analysis of this bug.
Quoting Eric: Calling ip_defrag() in output path is also implying
skb_orphan(), which is buggy because output path relies on sk not
disappearing. A relevant old patch about the issue was : 8282f27449bf
(“inet: frag: Always orphan skbs inside ip_defrag()”) [..
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
inet socket, not an arbitrary one. If we orphan the packet in ipvlan,
then downstream things like FQ packet scheduler will not work properly.
We need to change ip_defrag() to only use skb_orphan() when really
needed, ie whenever frag_list is going to be used. Eric suggested to
stash sk in fragment queue and made an initial patch. However there is a
problem with this: If skb is refragmented again right after,
ip_do_fragment() will copy head->sk to the new fragments, and sets up
destructor to sock_wfree. IOW, we have no choice but to fix up sk_wmem
accouting to reflect the fully reassembled skb, else wmem will
underflow. This change moves the orphan down into the core, to last
possible moment. As ip_defrag_offset is aliased with sk_buff->sk member,
we must move the offset into the FRAG_CB, else skb->sk gets clobbered.
This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue. In the former
case, things work as before, skb is orphaned. This is safe because skb
gets queued/stolen and won’t continue past reasm engine. In the latter
case, we will steal the skb->sk reference, reattach it to the head skb,
and fix up wmem accouting when inet_frag inflates truesize.
(CVE-2024-26921)
In the Linux kernel, the following vulnerability has been resolved: mm:
swap: fix race between free_swap_and_cache() and swapoff() There was
previously a theoretical window where swapoff() could run and teardown a
swap_info_struct while a call to free_swap_and_cache() was running in
another thread. This could cause, amongst other bad possibilities,
swap_page_trans_huge_swapped() (called by free_swap_and_cache()) to
access the freed memory for swap_map. This is a theoretical problem and
I haven’t been able to provoke it from a test case. But there has been
agreement based on code review that this is possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was not free. This isn’t present in get_swap_device()
because it doesn’t make sense in general due to the race between getting
the reference and swapoff. So I’ve added an equivalent check directly in
free_swap_and_cache(). Details of how to provoke one possible issue
(thanks to David Hildenbrand for deriving this): –8<—–
__swap_entry_free() might be the last user and result in “count ==
SWAP_HAS_CACHE”. swapoff->try_to_unuse() will stop as soon as soon as
si->inuse_pages==0. So the question is: could someone reclaim the folio
and turn si->inuse_pages==0, before we completed
swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in
the swapcache. Only 2 subpages are still references by swap entries.
Process 1 still references subpage 0 via swap entry. Process 2 still
references subpage 1 via swap entry. Process 1 quits. Calls
free_swap_and_cache(). -> count == SWAP_HAS_CACHE [then, preempted in
the hypervisor etc.] Process 2 quits. Calls free_swap_and_cache(). ->
count == SWAP_HAS_CACHE Process 2 goes ahead, passes
swap_page_trans_huge_swapped(), and calls __try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()-> … WRITE_ONCE(si->inuse_pages,
si->inuse_pages - nr_entries); What stops swapoff to succeed after
process 2 reclaimed the swap cache but before process1 finished its call
to swap_page_trans_huge_swapped()? –8<—– (CVE-2024-26960)
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: Fix use-after-free bugs caused by sco_sock_timeout When the
sco connection is established and then, the sco socket is releasing,
timeout_work will be scheduled to judge whether the sco disconnection is
timeout. The sock will be deallocated later, but it is dereferenced
again in sco_sock_timeout. As a result, the use-after-free bugs will
happen. The root cause is shown below: Cleanup Thread | Worker Thread
sco_sock_release | sco_sock_close | __sco_sock_close |
sco_sock_set_timer | schedule_delayed_work | sco_sock_kill | (wait a
time) sock_put(sk) //FREE | sco_sock_timeout | sock_hold(sk) //USE The
KASAN report triggered by POC is shown below: [ 95.890016
================================================================== [
95.890496] BUG: KASAN: slab-use-after-free in
sco_sock_timeout+0x5e/0x1c0 [ 95.890755] Write of size 4 at addr
ffff88800c388080 by task kworker/0:0/7 … [ 95.890755] Workqueue: events
sco_sock_timeout [ 95.890755] Call Trace: [ 95.890755]
[ 95.890755] dump_stack_lvl+0x45/0x110 [ 95.890755]
print_address_description+0x78/0x390 [ 95.890755
print_report+0x11b/0x250 [ 95.890755] ? __virt_addr_valid+0xbe/0xf0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_report+0x139/0x170 [ 95.890755] ? update_load_avg+0xe5/0x9f0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_check_range+0x2c3/0x2e0 [ 95.890755] sco_sock_timeout+0x5e/0x1c0 [
95.890755] process_one_work+0x561/0xc50 [ 95.890755
worker_thread+0xab2/0x13c0 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] kthread+0x279/0x300 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755]
ret_from_fork+0x34/0x60 [ 95.890755] ? kthread_blkcg+0xa0/0xa0 [
95.890755 ret_from_fork_asm+0x11/0x20 [ 95.890755]
[ 95.890755] [ 95.890755 Allocated by task 506: [ 95.890755]
kasan_save_track+0x3f/0x70 [ 95.890755 __kasan_kmalloc+0x86/0x90 [
95.890755] __kmalloc+0x17f/0x360 [ 95.890755 sk_prot_alloc+0xe1/0x1a0 [
95.890755] sk_alloc+0x31/0x4e0 [ 95.890755 bt_sock_alloc+0x2b/0x2a0 [
95.890755] sco_sock_create+0xad/0x320 [ 95.890755]
bt_sock_create+0x145/0x320 [ 95.890755 __sock_create+0x2e1/0x650 [
95.890755] __sys_socket+0xd0/0x280 [ 95.890755
__x64_sys_socket+0x75/0x80 [ 95.890755] do_syscall_64+0xc4/0x1b0 [
95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [
95.890755] Freed by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [
95.890755] kasan_save_free_info+0x40/0x50 [ 95.890755
poison_slab_object+0x118/0x180 [ 95.890755] __kasan_slab_free+0x12/0x30
[ 95.890755] kfree+0xb2/0x240 [ 95.890755] __sk_destruct+0x317/0x410 [
95.890755] sco_sock_release+0x232/0x280 [ 95.890755]
sock_close+0xb2/0x210 [ 95.890755] __fput+0x37f/0x770 [ 95.890755]
task_work_run+0x1ae/0x210 [ 95.890755] get_signal+0xe17/0xf70 [
95.890755 arch_do_signal_or_restart+0x3f/0x520 [ 95.890755
syscall_exit_to_user_mode+0x55/0x120 [ 95.890755]
do_syscall_64+0xd1/0x1b0 [ 95.890755]
entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [ 95.890755] The
buggy address belongs to the object at ffff88800c388000 [ 95.890755]
which belongs to the cache kmalloc-1k of size 1024 [ 95.890755 The buggy
address is located 128 bytes inside of [ 95.890755] freed 1024-byte
region [ffff88800c388000, ffff88800c388400) [ 95.890755] [ 95.890755]
The buggy address belongs to the physical page: [ 95.890755 page:
refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88800c38a800
pfn:0xc388 [ 95.890755] head: order:3 entire_mapcount:0
nr_pages_mapped:0 pincount:0 [ 95.890755] ano —truncated—
(CVE-2024-27398)
In the Linux kernel, the following vulnerability has been resolved:
watchdog: cpu5wdt.c: Fix use-after-free bug caused by cpu5wdt_trigger
When the cpu5wdt module is removing, the origin code uses del_timer() to
de-activate the timer. If the timer handler is running, del_timer()
could not stop it and will return directly. If the port region is
released by release_region() and then the timer handler
cpu5wdt_trigger() calls outb() to write into the region that is
released, the use-after-free bug will happen. Change del_timer() to
timer_shutdown_sync() in order that the timer handler could be finished
before the port region is released. (CVE-2024-38630)
In the Linux kernel, the following vulnerability has been resolved:
exec: Fix ToCToU between perm check and set-uid/gid usage When opening a
file for exec via do_filp_open(), permission checking is done against
the file’s metadata at that moment, and on success, a file pointer is
passed back. Much later in the execve() code path, the file metadata
(specifically mode, uid, and gid) is used to determine if/how to set the
uid and gid. However, those values may have changed since the
permissions check, meaning the execution may gain unintended privileges.
For example, if a file could change permissions from executable and not
set-id: ———x 1 root root 16048 Aug 7 13:16 target to set-id and non-
executable: —S—— 1 root root 16048 Aug 7 13:16 target it is possible to
gain root privileges when execution should have been disallowed. While
this race condition is rare in real-world scenarios, it has been
observed (and proven exploitable) when package managers are updating the
setuid bits of installed programs. Such files start with being
world-executable but then are adjusted to be group-exec with a set-uid
bit. For example, “chmod o-x,u+s target” makes “target” executable only
by uid “root” and gid “cdrom”, while also becoming setuid-root:
-rwxr-xr-x 1 root cdrom 16048 Aug 7 13:16 target becomes: -rwsr-xr– 1
root cdrom 16048 Aug 7 13:16 target But racing the chmod means users
without group “cdrom” membership can get the permission to execute
“target” just before the chmod, and when the chmod finishes, the exec
reaches brpm_fill_uid(), and performs the setuid to root, violating the
expressed authorization of “only cdrom group members can setuid to
root”. Re-check that we still have execute permissions in case the
metadata has changed. It would be better to keep a copy from the
perm-check time, but until we can do that refactoring, the least-bad
option is to do a full inode_permission() call (under inode lock). It is
understood that this is safe against dead-locks, but hardly optimal.
(CVE-2024-43882)
In the Linux kernel, the following vulnerability has been resolved:
vsock/virtio: Initialization of the dangling pointer occurring in
vsk->trans During loopback communication, a dangling pointer can be
created in vsk->trans, potentially leading to a Use-After-Free
condition. This issue is resolved by initializing vsk->trans to NULL.
(CVE-2024-50264)
Update instructions
The problem can be corrected by updating your kernel livepatch to the
following versions:
Ubuntu 20.04 LTS
aws - 108.1
aws - 108.2
aws - 108.3
azure - 108.1
azure - 108.2
azure - 108.3
gcp - 108.1
gcp - 108.3
generic - 108.1
generic - 108.2
generic - 108.3
gkeop - 108.1
gkeop - 108.2
gkeop - 108.3
ibm - 108.1
ibm - 108.3
lowlatency - 108.1
lowlatency - 108.2
lowlatency - 108.3
oracle - 108.1
oracle - 108.2
oracle - 108.3
Ubuntu 18.04 LTS
aws - 108.2
azure - 108.2
gcp - 108.2
generic - 108.1
generic - 108.2
generic - 108.3
lowlatency - 108.1
lowlatency - 108.2
lowlatency - 108.3
oracle - 108.2
Ubuntu 16.04 LTS
aws - 108.1
aws - 108.2
azure - 108.2
gcp - 108.1
generic - 108.1
generic - 108.2
lowlatency - 108.1
lowlatency - 108.2
Ubuntu 22.04 LTS
aws - 108.1
aws - 108.3
azure - 108.1
azure - 108.3
gcp - 108.1
gcp - 108.3
generic - 108.1
generic - 108.3
gke - 108.1
gke - 108.3
ibm - 108.1
ibm - 108.3
oracle - 108.1
oracle - 108.3
Ubuntu 14.04 LTS
generic - 108.1
lowlatency - 108.1
Support Information
Livepatches for supported LTS kernels will receive upgrades for a period
of up to 13 months after the build date of the kernel.
Livepatches for supported HWE kernels which are not based on an LTS
kernel version will receive upgrades for a period of up to 9 months
after the build date of the kernel, or until the end of support for that
kernel’s non-LTS distro release version, whichever is sooner.
References
- CVE-2024-26800
- CVE-2024-26921
- CVE-2024-26960
- CVE-2024-27398
- CVE-2024-38630
- CVE-2024-43882
- CVE-2024-50264
More information about the ubuntu-security-announce
mailing list