Synopsis
The remote TencentOS Server 4 host is missing one or more security updates.
Description
The version of Tencent Linux installed on the remote TencentOS Server 4 host is prior to tested version. It is, therefore, affected by multiple vulnerabilities as referenced in the TSSA-2024:0980 advisory.
Package updates are available for TencentOS Server 4 that fix the following vulnerabilities:
CVE-2024-46693:
In the Linux kernel, the following vulnerability has been resolved:
soc: qcom: pmic_glink: Fix race during initialization
As pointed out by Stephen Boyd it is possible that during initialization of the pmic_glink child drivers, the protection-domain notifiers fires, and the associated work is scheduled, before the client registration returns and as a result the local client pointer has been initialized.
The outcome of this is a NULL pointer dereference as the client pointer is blindly dereferenced.
Timeline provided by Stephen:
CPU0 CPU1
---- ---- ucsi->client = NULL;
devm_pmic_glink_register_client() client->pdr_notify(client->priv, pg->client_state) pmic_glink_ucsi_pdr_notify() schedule_work(&ucsi->register_work) <schedule away> pmic_glink_ucsi_register() ucsi_register() pmic_glink_ucsi_read_version() pmic_glink_ucsi_read() pmic_glink_ucsi_read() pmic_glink_send(ucsi->client) <client is NULL BAD> ucsi->client = client // Too late!
This code is identical across the altmode, battery manager and usci child drivers.
Resolve this by splitting the allocation of the client object and the registration thereof into two operations.
This only happens if the protection domain registry is populated at the time of registration, which by the introduction of commit '1ebcde047c54 (soc: qcom: add pd-mapper implementation)' became much more likely.
CVE-2024-44966:
In the Linux kernel, the following vulnerability has been resolved:
binfmt_flat: Fix corruption when not offsetting data start
Commit 04d82a6d0881 (binfmt_flat: allow not offsetting data start) introduced a RISC-V specific variant of the FLAT format which does not allocate any space for the (obsolete) array of shared library pointers. However, it did not disable the code which initializes the array, resulting in the corruption of sizeof(long) bytes before the DATA segment, generally the end of the TEXT segment.
Introduce MAX_SHARED_LIBS_UPDATE which depends on the state of CONFIG_BINFMT_FLAT_NO_DATA_START_OFFSET to guard the initialization of the shared library pointer region so that it will only be initialized if space is reserved for it.
CVE-2024-44959:
In the Linux kernel, the following vulnerability has been resolved:
tracefs: Use generic inode RCU for synchronizing freeing
With structure layout randomization enabled for 'struct inode' we need to avoid overlapping any of the RCU-used / initialized-only-once members, e.g. i_lru or i_sb_list to not corrupt related list traversals when making use of the rcu_head.
For an unlucky structure layout of 'struct inode' we may end up with the following splat when running the ftrace selftests:
[<...>] list_del corruption, ffff888103ee2cb0->next (tracefs_inode_cache+0x0/0x4e0 [slab object]) is NULL (prev is tracefs_inode_cache+0x78/0x4e0 [slab object]) [<...>] ------------[ cut here ]------------ [<...>] kernel BUG at lib/list_debug.c:54! [<...>] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [<...>] CPU: 3 PID: 2550 Comm: mount Tainted: G N 6.8.12-grsec+ #122 ed2f536ca62f28b087b90e3cc906a8d25b3ddc65 [<...>] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [<...>] RIP: 0010:[<ffffffff84656018>] __list_del_entry_valid_or_report+0x138/0x3e0 [<...>] Code: 48 b8 99 fb 65 f2 ff ff ff ff e9 03 5c d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff e9 33 5a d9 fc cc 48 b8 99 fb 65 f2 ff ff ff ff <0f> 0b 4c 89 e9 48 89 ea 48 89 ee 48 c7 c7 60 8f dd 89 31 c0 e8 2f [<...>] RSP: 0018:fffffe80416afaf0 EFLAGS: 00010283 [<...>] RAX: 0000000000000098 RBX: ffff888103ee2cb0 RCX: 0000000000000000 [<...>] RDX: ffffffff84655fe8 RSI: ffffffff89dd8b60 RDI: 0000000000000001 [<...>] RBP: ffff888103ee2cb0 R08: 0000000000000001 R09: fffffbd0082d5f25 [<...>] R10: fffffe80416af92f R11: 0000000000000001 R12: fdf99c16731d9b6d [<...>] R13: 0000000000000000 R14: ffff88819ad4b8b8 R15: 0000000000000000 [<...>] RBX: tracefs_inode_cache+0x0/0x4e0 [slab object] [<...>] RDX: __list_del_entry_valid_or_report+0x108/0x3e0 [<...>] RSI: __func__.47+0x4340/0x4400 [<...>] RBP: tracefs_inode_cache+0x0/0x4e0 [slab object] [<...>] RSP: process kstack fffffe80416afaf0+0x7af0/0x8000 [mount 2550 2550] [<...>] R09: kasan shadow of process kstack fffffe80416af928+0x7928/0x8000 [mount 2550 2550] [<...>] R10: process kstack fffffe80416af92f+0x792f/0x8000 [mount 2550 2550] [<...>] R14: tracefs_inode_cache+0x78/0x4e0 [slab object] [<...>] FS: 00006dcb380c1840(0000) GS:ffff8881e0600000(0000) knlGS:0000000000000000 [<...>] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [<...>] CR2: 000076ab72b30e84 CR3: 000000000b088004 CR4: 0000000000360ef0 shadow CR4: 0000000000360ef0 [<...>] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [<...>] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [<...>] ASID: 0003 [<...>] Stack:
[<...>] ffffffff818a2315 00000000f5c856ee ffffffff896f1840 ffff888103ee2cb0 [<...>] ffff88812b6b9750 0000000079d714b6 fffffbfff1e9280b ffffffff8f49405f [<...>] 0000000000000001 0000000000000000 ffff888104457280 ffffffff8248b392 [<...>] Call Trace:
[<...>] <TASK> [<...>] [<ffffffff818a2315>] ? lock_release+0x175/0x380 fffffe80416afaf0 [<...>] [<ffffffff8248b392>] list_lru_del+0x152/0x740 fffffe80416afb48 [<...>] [<ffffffff8248ba93>] list_lru_del_obj+0x113/0x280 fffffe80416afb88 [<...>] [<ffffffff8940fd19>] ? _atomic_dec_and_lock+0x119/0x200 fffffe80416afb90 [<...>] [<ffffffff8295b244>] iput_final+0x1c4/0x9a0 fffffe80416afbb8 [<...>] [<ffffffff8293a52b>] dentry_unlink_inode+0x44b/0xaa0 fffffe80416afbf8 [<...>] [<ffffffff8293fefc>] __dentry_kill+0x23c/0xf00 fffffe80416afc40 [<...>] [<ffffffff8953a85f>] ? __this_cpu_preempt_check+0x1f/0xa0 fffffe80416afc48 [<...>] [<ffffffff82949ce5>] ? shrink_dentry_list+0x1c5/0x760 fffffe80416afc70 [<...>] [<ffffffff82949b71>] ? shrink_dentry_list+0x51/0x760 fffffe80416afc78 [<...>] [<ffffffff82949da8>] shrink_dentry_list+0x288/0x760 fffffe80416afc80 [<...>] [<ffffffff8294ae75>] shrink_dcache_sb+0x155/0x420 fffffe80416afcc8 [<...>] [<ffffffff8953a7c3>] ? debug_smp_processor_id+0x23/0xa0 fffffe80416afce0 [<...>] [<ffffffff8294ad20>] ? do_one_tre
---truncated---
CVE-2024-44942:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
syzbot reports a f2fs bug as below:
------------[ cut here ]------------ kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace:
f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline]
__f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612
__writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback.
Let's add sanity check on F2FS_INLINE_DATA flag in inode during GC, so that, it can forbid migrating inline_data inode's data block for fixing.
CVE-2024-43885:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix double inode unlock for direct IO sync writes
If we do a direct IO sync write, at btrfs_sync_file(), and we need to skip inode logging or we get an error starting a transaction or an error when flushing delalloc, we end up unlocking the inode when we shouldn't under the 'out_release_extents' label, and then unlock it again at btrfs_direct_write().
Fix that by checking if we have to skip inode unlocking under that label.
CVE-2024-43837:
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT
When loading a EXT program without specifying `attr->attach_prog_fd`, the `prog->aux->dst_prog` will be null. At this time, calling resolve_prog_type() anywhere will result in a null pointer dereference.
Example stack trace:
[ 8.107863] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004 [ 8.108262] Mem abort info:
[ 8.108384] ESR = 0x0000000096000004 [ 8.108547] EC = 0x25: DABT (current EL), IL = 32 bits [ 8.108722] SET = 0, FnV = 0 [ 8.108827] EA = 0, S1PTW = 0 [ 8.108939] FSC = 0x04: level 0 translation fault [ 8.109102] Data abort info:
[ 8.109203] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 8.109399] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 8.109614] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 8.109836] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101354000 [ 8.110011] [0000000000000004] pgd=0000000000000000, p4d=0000000000000000 [ 8.112624] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 8.112783] Modules linked in:
[ 8.113120] CPU: 0 PID: 99 Comm: may_access_dire Not tainted 6.10.0-rc3-next-20240613-dirty #1 [ 8.113230] Hardware name: linux,dummy-virt (DT) [ 8.113390] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 8.113429] pc : may_access_direct_pkt_data+0x24/0xa0 [ 8.113746] lr : add_subprog_and_kfunc+0x634/0x8e8 [ 8.113798] sp : ffff80008283b9f0 [ 8.113813] x29: ffff80008283b9f0 x28: ffff800082795048 x27: 0000000000000001 [ 8.113881] x26: ffff0000c0bb2600 x25: 0000000000000000 x24: 0000000000000000 [ 8.113897] x23: ffff0000c1134000 x22: 000000000001864f x21: ffff0000c1138000 [ 8.113912] x20: 0000000000000001 x19: ffff0000c12b8000 x18: ffffffffffffffff [ 8.113929] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007200720 [ 8.113944] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720 [ 8.113958] x11: 0720072007200720 x10: 0000000000f9fca4 x9 : ffff80008021f4e4 [ 8.113991] x8 : 0101010101010101 x7 : 746f72705f6d656d x6 : 000000001e0e0f5f [ 8.114006] x5 : 000000000001864f x4 : ffff0000c12b8000 x3 : 000000000000001c [ 8.114020] x2 : 0000000000000002 x1 : 0000000000000000 x0 : 0000000000000000 [ 8.114126] Call trace:
[ 8.114159] may_access_direct_pkt_data+0x24/0xa0 [ 8.114202] bpf_check+0x3bc/0x28c0 [ 8.114214] bpf_prog_load+0x658/0xa58 [ 8.114227] __sys_bpf+0xc50/0x2250 [ 8.114240] __arm64_sys_bpf+0x28/0x40 [ 8.114254] invoke_syscall.constprop.0+0x54/0xf0 [ 8.114273] do_el0_svc+0x4c/0xd8 [ 8.114289] el0_svc+0x3c/0x140 [ 8.114305] el0t_64_sync_handler+0x134/0x150 [ 8.114331] el0t_64_sync+0x168/0x170 [ 8.114477] Code: 7100707f 54000081 f9401c00 f9403800 (b9400403) [ 8.118672] ---[ end trace 0000000000000000 ]---
One way to fix it is by forcing `attach_prog_fd` non-empty when bpf_prog_load(). But this will lead to `libbpf_probe_bpf_prog_type` API broken which use verifier log to probe prog type and will log nothing if we reject invalid EXT prog before bpf_check().
Another way is by adding null check in resolve_prog_type().
The issue was introduced by commit 4a9c7bbe2ed4 (bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT) which wanted to correct type resolution for BPF_PROG_TYPE_TRACING programs. Before that, the type resolution of BPF_PROG_TYPE_EXT prog actually follows the logic below:
prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type;
It implies that when EXT program is not yet attached to `dst_prog`, the prog type should be EXT itself. This code worked fine in the past.
So just keep using it.
Fix this by returning `prog->type` for BPF_PROG_TYPE_EXT if `dst_prog` is not present in resolve_prog_type().
CVE-2024-43817:
In the Linux kernel, the following vulnerability has been resolved:
net: missing check virtio
Two missing check in virtio_net_hdr_to_skb() allowed syzbot to crash kernels again
1. After the skb_segment function the buffer may become non-linear (nr_frags != 0), but since the SKBTX_SHARED_FRAG flag is not set anywhere the __skb_linearize function will not be executed, then the buffer will remain non-linear. Then the condition (offset >= skb_headlen(skb)) becomes true, which causes WARN_ON_ONCE in skb_checksum_help.
2. The struct sk_buff and struct virtio_net_hdr members must be mathematically related.
(gso_size) must be greater than (needed) otherwise WARN_ON_ONCE.
(remainder) must be greater than (needed) otherwise WARN_ON_ONCE.
(remainder) may be 0 if division is without remainder.
offset+2 (4191) > skb_headlen() (1116) WARNING: CPU: 1 PID: 5084 at net/core/dev.c:3303 skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 Modules linked in:
CPU: 1 PID: 5084 Comm: syz-executor336 Not tainted 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 RIP: 0010:skb_checksum_help+0x5e2/0x740 net/core/dev.c:3303 Code: 89 e8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 52 01 00 00 44 89 e2 2b 53 74 4c 89 ee 48 c7 c7 40 57 e9 8b e8 af 8f dd f8 90 <0f> 0b 90 90 e9 87 fe ff ff e8 40 0f 6e f9 e9 4b fa ff ff 48 89 ef RSP: 0018:ffffc90003a9f338 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888025125780 RCX: ffffffff814db209 RDX: ffff888015393b80 RSI: ffffffff814db216 RDI: 0000000000000001 RBP: ffff8880251257f4 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000045c R13: 000000000000105f R14: ffff8880251257f0 R15: 000000000000105d FS: 0000555555c24380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000002000f000 CR3: 0000000023151000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace:
<TASK> ip_do_fragment+0xa1b/0x18b0 net/ipv4/ip_output.c:777 ip_fragment.constprop.0+0x161/0x230 net/ipv4/ip_output.c:584 ip_finish_output_gso net/ipv4/ip_output.c:286 [inline]
__ip_finish_output net/ipv4/ip_output.c:308 [inline]
__ip_finish_output+0x49c/0x650 net/ipv4/ip_output.c:295 ip_finish_output+0x31/0x310 net/ipv4/ip_output.c:323 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip_output+0x13b/0x2a0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:451 [inline] ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:129 iptunnel_xmit+0x5b4/0x9b0 net/ipv4/ip_tunnel_core.c:82 ipip6_tunnel_xmit net/ipv6/sit.c:1034 [inline] sit_tunnel_xmit+0xed2/0x28f0 net/ipv6/sit.c:1076
__netdev_start_xmit include/linux/netdevice.h:4940 [inline] netdev_start_xmit include/linux/netdevice.h:4954 [inline] xmit_one net/core/dev.c:3545 [inline] dev_hard_start_xmit+0x13d/0x6d0 net/core/dev.c:3561
__dev_queue_xmit+0x7c1/0x3d60 net/core/dev.c:4346 dev_queue_xmit include/linux/netdevice.h:3134 [inline] packet_xmit+0x257/0x380 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3087 [inline] packet_sendmsg+0x24ca/0x5240 net/packet/af_packet.c:3119 sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Found by Linux Verification Center (linuxtesting.org) with Syzkaller
CVE-2024-42318:
In the Linux kernel, the following vulnerability has been resolved:
landlock: Don't lose track of restrictions on cred_transfer
When a process' cred struct is replaced, this _almost_ always invokes the cred_prepare LSM hook; but in one special case (when KEYCTL_SESSION_TO_PARENT updates the parent's credentials), the cred_transfer LSM hook is used instead. Landlock only implements the cred_prepare hook, not cred_transfer, so KEYCTL_SESSION_TO_PARENT causes all information on Landlock restrictions to be lost.
This basically means that a process with the ability to use the fork() and keyctl() syscalls can get rid of all Landlock restrictions on itself.
Fix it by adding a cred_transfer hook that does the same thing as the existing cred_prepare hook. (Implemented by having hook_cred_prepare() call hook_cred_transfer() so that the two functions are less likely to accidentally diverge in the future.)
CVE-2024-42312:
In the Linux kernel, the following vulnerability has been resolved:
sysctl: always initialize i_uid/i_gid
Always initialize i_uid/i_gid inside the sysfs core so set_ownership() can safely skip setting them.
Commit 5ec27ec735ba (fs/proc/proc_sysctl.c: fix the default values of i_uid/i_gid on /proc/sys inodes.) added defaults for i_uid/i_gid when set_ownership() was not implemented. It also missed adjusting net_ctl_set_ownership() to use the same default values in case the computation of a better value failed.
CVE-2024-42270:
In the Linux kernel, the following vulnerability has been resolved:
netfilter: iptables: Fix null-ptr-deref in iptable_nat_table_init().
We had a report that iptables-restore sometimes triggered null-ptr-deref at boot time. [0]
The problem is that iptable_nat_table_init() is exposed to user space before the kernel fully initialises netns.
In the small race window, a user could call iptable_nat_table_init() that accesses net_generic(net, iptable_nat_net_id), which is available only after registering iptable_nat_net_ops.
Let's call register_pernet_subsys() before xt_register_template().
[0]:
bpfilter: Loaded bpfilter_umh pid 11702 Started bpfilter BUG: kernel NULL pointer dereference, address: 0000000000000013 PF: supervisor write access in kernel mode PF: error_code(0x0002) - not-present page PGD 0 P4D 0 PREEMPT SMP NOPTI CPU: 2 PID: 11879 Comm: iptables-restor Not tainted 6.1.92-99.174.amzn2023.x86_64 #1 Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat Code: 10 4c 89 f6 48 89 ef e8 0b 19 bb ff 41 89 c4 85 c0 75 38 41 83 c7 01 49 83 c6 28 41 83 ff 04 75 dc 48 8b 44 24 08 48 8b 0c 24 <48> 89 08 4c 89 ef e8 a2 3b a2 cf 48 83 c4 10 44 89 e0 5b 5d 41 5c RSP: 0018:ffffbef902843cd0 EFLAGS: 00010246 RAX: 0000000000000013 RBX: ffff9f4b052caa20 RCX: ffff9f4b20988d80 RDX: 0000000000000000 RSI: 0000000000000064 RDI: ffffffffc04201c0 RBP: ffff9f4b29394000 R08: ffff9f4b07f77258 R09: ffff9f4b07f77240 R10: 0000000000000000 R11: ffff9f4b09635388 R12: 0000000000000000 R13: ffff9f4b1a3c6c00 R14: ffff9f4b20988e20 R15: 0000000000000004 FS: 00007f6284340000(0000) GS:ffff9f51fe280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000013 CR3: 00000001d10a6005 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace:
<TASK> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) ? xt_find_table_lock (net/netfilter/x_tables.c:1259) ? __die_body.cold (arch/x86/kernel/dumpstack.c:478 arch/x86/kernel/dumpstack.c:420) ? page_fault_oops (arch/x86/mm/fault.c:727) ? exc_page_fault (./arch/x86/include/asm/irqflags.h:40 ./arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1470 arch/x86/mm/fault.c:1518) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570) ? iptable_nat_table_init (net/ipv4/netfilter/iptable_nat.c:87 net/ipv4/netfilter/iptable_nat.c:121) iptable_nat xt_find_table_lock (net/netfilter/x_tables.c:1259) xt_request_find_table_lock (net/netfilter/x_tables.c:1287) get_info (net/ipv4/netfilter/ip_tables.c:965) ? security_capable (security/security.c:809 (discriminator 13)) ? ns_capable (kernel/capability.c:376 kernel/capability.c:397) ? do_ipt_get_ctl (net/ipv4/netfilter/ip_tables.c:1656) ? bpfilter_send_req (net/bpfilter/bpfilter_kern.c:52) bpfilter nf_getsockopt (net/netfilter/nf_sockopt.c:116) ip_getsockopt (net/ipv4/ip_sockglue.c:1827)
__sys_getsockopt (net/socket.c:2327)
__x64_sys_getsockopt (net/socket.c:2342 net/socket.c:2339 net/socket.c:2339) do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:81) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) RIP: 0033:0x7f62844685ee Code: 48 8b 0d 45 28 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 09 RSP: 002b:00007ffd1f83d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000037 RAX: ffffffffffffffda RBX: 00007ffd1f83d680 RCX: 00007f62844685ee RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 0000000000000004 R08: 00007ffd1f83d670 R09: 0000558798ffa2a0 R10: 00007ffd1f83d680 R11: 0000000000000246 R12: 00007ffd1f83e3b2 R13: 00007f6284
---truncated---
CVE-2024-42269:
In the Linux kernel, the following vulnerability has been resolved:
netfilter: iptables: Fix potential null-ptr-deref in ip6table_nat_table_init().
ip6table_nat_table_init() accesses net->gen->ptr[ip6table_nat_net_ops.id], but the function is exposed to user space before the entry is allocated via register_pernet_subsys().
Let's call register_pernet_subsys() before xt_register_template().
CVE-2024-42268:
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Fix missing lock on sync reset reload
On sync reset reload work, when remote host updates devlink on reload actions performed on that host, it misses taking devlink lock before calling devlink_remote_reload_actions_performed() which results in triggering lock assert like the following:
WARNING: CPU: 4 PID: 1164 at net/devlink/core.c:261 devl_assert_locked+0x3e/0x50 CPU: 4 PID: 1164 Comm: kworker/u96:6 Tainted: G S W 6.10.0-rc2+ #116 Hardware name: Supermicro SYS-2028TP-DECTR/X10DRT-PT, BIOS 2.0 12/18/2015 Workqueue: mlx5_fw_reset_events mlx5_sync_reset_reload_work [mlx5_core] RIP: 0010:devl_assert_locked+0x3e/0x50 Call Trace:
<TASK> ? __warn+0xa4/0x210 ? devl_assert_locked+0x3e/0x50 ? report_bug+0x160/0x280 ? handle_bug+0x3f/0x80 ? exc_invalid_op+0x17/0x40 ? asm_exc_invalid_op+0x1a/0x20 ? devl_assert_locked+0x3e/0x50 devlink_notify+0x88/0x2b0 ? mlx5_attach_device+0x20c/0x230 [mlx5_core] ? __pfx_devlink_notify+0x10/0x10 ? process_one_work+0x4b6/0xbb0 process_one_work+0x4b6/0xbb0 []
CVE-2024-42265:
In the Linux kernel, the following vulnerability has been resolved:
protect the fetch of ->fd[fd] in do_dup2() from mispredictions
both callers have verified that fd is not greater than ->max_fds;
however, misprediction might end up with tofree = fdt->fd[fd];
being speculatively executed. That's wrong for the same reasons why it's wrong in close_fd()/file_close_fd_locked(); the same solution applies - array_index_nospec(fd, fdt->max_fds) could differ from fd only in case of speculative execution on mispredicted path.
CVE-2024-42251:
In the Linux kernel, the following vulnerability has been resolved:
mm: page_ref: remove folio_try_get_rcu()
The below bug was reported on a non-SMP kernel:
[ 275.267158][ T4335] ------------[ cut here ]------------ [ 275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275! [ 275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI [ 275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1 [ 275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.272813][ T4335] RSP: 0018:ffffc90005dcf650 EFLAGS: 00010202 [ 275.273346][ T4335] RAX: 0000000000000246 RBX: ffffea00066e0000 RCX: 0000000000000000 [ 275.274032][ T4335] RDX: fffff94000cdc007 RSI: 0000000000000004 RDI: ffffea00066e0034 [ 275.274719][ T4335] RBP: ffffea00066e0000 R08: 0000000000000000 R09: fffff94000cdc006 [ 275.275404][ T4335] R10: ffffea00066e0037 R11: 0000000000000000 R12: 0000000000000136 [ 275.276106][ T4335] R13: ffffea00066e0034 R14: dffffc0000000000 R15: ffffea00066e0008 [ 275.276790][ T4335] FS: 00007fa2f9b61740(0000) GS:ffffffff89d0d000(0000) knlGS:0000000000000000 [ 275.277570][ T4335] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 275.278143][ T4335] CR2: 00007fa2f6c00000 CR3: 0000000134b04000 CR4: 00000000000406f0 [ 275.278833][ T4335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 275.279521][ T4335] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 275.280201][ T4335] Call Trace:
[ 275.280499][ T4335] <TASK> [ 275.280751][ T4335] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) [ 275.281087][ T4335] ? do_trap (arch/x86/kernel/traps.c:112 arch/x86/kernel/traps.c:153) [ 275.281463][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.281884][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.282300][ T4335] ? do_error_trap (arch/x86/kernel/traps.c:174) [ 275.282711][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.283129][ T4335] ? handle_invalid_op (arch/x86/kernel/traps.c:212) [ 275.283561][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.283990][ T4335] ? exc_invalid_op (arch/x86/kernel/traps.c:264) [ 275.284415][ T4335] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) [ 275.284859][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3)) [ 275.285278][ T4335] try_grab_folio (mm/gup.c:148) [ 275.285684][ T4335] __get_user_pages (mm/gup.c:1297 (discriminator 1)) [ 275.286111][ T4335] ? __pfx___get_user_pages (mm/gup.c:1188) [ 275.286579][ T4335] ? __pfx_validate_chain (kernel/locking/lockdep.c:3825) [ 275.287034][ T4335] ? mark_lock (kernel/locking/lockdep.c:4656 (discriminator 1)) [ 275.287416][ T4335] __gup_longterm_locked (mm/gup.c:1509 mm/gup.c:2209) [ 275.288192][ T4335] ? __pfx___gup_longterm_locked (mm/gup.c:2204) [ 275.288697][ T4335] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722) [ 275.289135][ T4335] ? __pfx___might_resched (kernel/sched/core.c:10106) [ 275.289595][ T4335] pin_user_pages_remote (mm/gup.c:3350) [ 275.290041][ T4335] ? __pfx_pin_user_pages_remote (mm/gup.c:3350) [ 275.290545][ T4335] ? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1)) [ 275.290961][ T4335] ? mm_access (kernel/fork.c:1573) [ 275.291353][ T4335] process_vm_rw_single_vec+0x142/0x360 [ 275.291900][ T4335] ? __pfx_process_vm_rw_single_vec+0x10/0x10 [ 275.292471][ T4335] ? mm_access (kernel/fork.c:1573) [ 275.292859][ T4335] process_vm_rw_core+0x272/0x4e0 [ 275.293384][ T4335] ? hlock_class (a
---truncated---
CVE-2024-42243:
In the Linux kernel, the following vulnerability has been resolved:
mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray
Patch series mm/filemap: Limit page cache size to that supported by xarray, v2.
Currently, xarray can't support arbitrary page cache size. More details can be found from the WARN_ON() statement in xas_split_alloc(). In our test whose code is attached below, we hit the WARN_ON() on ARM64 system where the base page size is 64KB and huge page size is 512MB. The issue was reported long time ago and some discussions on it can be found here [1].
[1] https://www.spinics.net/lists/linux-xfs/msg75404.html
In order to fix the issue, we need to adjust MAX_PAGECACHE_ORDER to one supported by xarray and avoid PMD-sized page cache if needed. The code changes are suggested by David Hildenbrand.
PATCH[1] adjusts MAX_PAGECACHE_ORDER to that supported by xarray PATCH[2-3] avoids PMD-sized page cache in the synchronous readahead path PATCH[4] avoids PMD-sized page cache for shmem files if needed
Test program ============ # cat test.c #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <fcntl.h> #include <errno.h> #include <sys/syscall.h> #include <sys/mman.h>
#define TEST_XFS_FILENAME /tmp/data #define TEST_SHMEM_FILENAME /dev/shm/data #define TEST_MEM_SIZE 0x20000000
int main(int argc, char **argv) { const char *filename;
int fd = 0;
void *buf = (void *)-1, *p;
int pgsize = getpagesize();
int ret;
if (pgsize != 0x10000) { fprintf(stderr, 64KB base page size is required\n);
return -EPERM;
}
system(echo force > /sys/kernel/mm/transparent_hugepage/shmem_enabled);
system(rm -fr /tmp/data);
system(rm -fr /dev/shm/data);
system(echo 1 > /proc/sys/vm/drop_caches);
/* Open xfs or shmem file */ filename = TEST_XFS_FILENAME;
if (argc > 1 && !strcmp(argv[1], shmem)) filename = TEST_SHMEM_FILENAME;
fd = open(filename, O_CREAT | O_RDWR | O_TRUNC);
if (fd < 0) { fprintf(stderr, Unable to open <%s>\n, filename);
return -EIO;
}
/* Extend file size */ ret = ftruncate(fd, TEST_MEM_SIZE);
if (ret) { fprintf(stderr, Error %d to ftruncate()\n, ret);
goto cleanup;
}
/* Create VMA */ buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (buf == (void *)-1) { fprintf(stderr, Unable to mmap <%s>\n, filename);
goto cleanup;
}
fprintf(stdout, mapped buffer at 0x%p\n, buf);
ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE);
if (ret) { fprintf(stderr, Unable to madvise(MADV_HUGEPAGE)\n);
goto cleanup;
}
/* Populate VMA */ ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_WRITE);
if (ret) { fprintf(stderr, Error %d to madvise(MADV_POPULATE_WRITE)\n, ret);
goto cleanup;
}
/* Punch the file to enforce xarray split */ ret = fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, TEST_MEM_SIZE - pgsize, pgsize);
if (ret) fprintf(stderr, Error %d to fallocate()\n, ret);
cleanup:
if (buf != (void *)-1) munmap(buf, TEST_MEM_SIZE);
if (fd > 0) close(fd);
return 0;
}
# gcc test.c -o test # cat /proc/1/smaps | grep KernelPageSize | head -n 1 KernelPageSize: 64 kB # ./test shmem :
------------[ cut here ]------------ WARNING: CPU: 17 PID: 5253 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128 Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \ nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \ nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \ ip_set nf_tables rfkill nfnetlink vfat fat virtio_balloon \ drm fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 \ virtio_net sha1_ce net_failover failover virtio_console virtio_blk \ dimlib virtio_mmio CPU: 17 PID: 5253 Comm: test Kdump: loaded Tainted: G W 6.10.0-rc5-gavin+ #12 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024 pstate: 83400005 (Nzcv daif +PAN -UAO +TC
---truncated---
CVE-2024-42240:
In the Linux kernel, the following vulnerability has been resolved:
x86/bhi: Avoid warning in #DB handler due to BHI mitigation
When BHI mitigation is enabled, if SYSENTER is invoked with the TF flag set then entry_SYSENTER_compat() uses CLEAR_BRANCH_HISTORY and calls the clear_bhb_loop() before the TF flag is cleared. This causes the #DB handler (exc_debug_kernel()) to issue a warning because single-step is used outside the entry_SYSENTER_compat() function.
To address this issue, entry_SYSENTER_compat() should use CLEAR_BRANCH_HISTORY after making sure the TF flag is cleared.
The problem can be reproduced with the following sequence:
$ cat sysenter_step.c int main() { asm(pushf; pop %ax; bts $8,%ax; push %ax; popf; sysenter); }
$ gcc -o sysenter_step sysenter_step.c
$ ./sysenter_step Segmentation fault (core dumped)
The program is expected to crash, and the #DB handler will issue a warning.
Kernel log:
WARNING: CPU: 27 PID: 7000 at arch/x86/kernel/traps.c:1009 exc_debug_kernel+0xd2/0x160 ...
RIP: 0010:exc_debug_kernel+0xd2/0x160 ...
Call Trace:
<#DB> ? show_regs+0x68/0x80 ? __warn+0x8c/0x140 ? exc_debug_kernel+0xd2/0x160 ? report_bug+0x175/0x1a0 ? handle_bug+0x44/0x90 ? exc_invalid_op+0x1c/0x70 ? asm_exc_invalid_op+0x1f/0x30 ? exc_debug_kernel+0xd2/0x160 exc_debug+0x43/0x50 asm_exc_debug+0x1e/0x40 RIP: 0010:clear_bhb_loop+0x0/0xb0 ...
</#DB> <TASK> ? entry_SYSENTER_compat_after_hwframe+0x6e/0x8d </TASK>
[ bp: Massage commit message. ]
CVE-2024-42229:
In the Linux kernel, the following vulnerability has been resolved:
crypto: aead,cipher - zeroize key buffer after use
I.G 9.7.B for FIPS 140-3 specifies that variables temporarily holding cryptographic information should be zeroized once they are no longer needed. Accomplish this by using kfree_sensitive for buffers that previously held the private key.
CVE-2024-42141:
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: ISO: Check socket flag instead of hcon
This fixes the following Smatch static checker warning:
net/bluetooth/iso.c:1364 iso_sock_recvmsg() error: we previously assumed 'pi->conn->hcon' could be null (line 1359)
net/bluetooth/iso.c 1347 static int iso_sock_recvmsg(struct socket *sock, struct msghdr *msg, 1348 size_t len, int flags) 1349 { 1350 struct sock *sk = sock->sk;
1351 struct iso_pinfo *pi = iso_pi(sk);
1352 1353 BT_DBG(sk %p, sk);
1354 1355 if (test_and_clear_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) { 1356 lock_sock(sk);
1357 switch (sk->sk_state) { 1358 case BT_CONNECT2:
1359 if (pi->conn->hcon && ^^^^^^^^^^^^^^ If ->hcon is NULL
1360 test_bit(HCI_CONN_PA_SYNC, &pi->conn->hcon->flags)) { 1361 iso_conn_big_sync(sk);
1362 sk->sk_state = BT_LISTEN;
1363 } else {
--> 1364 iso_conn_defer_accept(pi->conn->hcon);
^^^^^^^^^^^^^^ then we're toast
1365 sk->sk_state = BT_CONFIG;
1366 } 1367 release_sock(sk);
1368 return 0;
1369 case BT_CONNECTED:
1370 if (test_bit(BT_SK_PA_SYNC,
CVE-2024-42115:
In the Linux kernel, the following vulnerability has been resolved:
jffs2: Fix potential illegal address access in jffs2_free_inode
During the stress testing of the jffs2 file system,the following abnormal printouts were found:
[ 2430.649000] Unable to handle kernel paging request at virtual address 0069696969696948 [ 2430.649622] Mem abort info:
[ 2430.649829] ESR = 0x96000004 [ 2430.650115] EC = 0x25: DABT (current EL), IL = 32 bits [ 2430.650564] SET = 0, FnV = 0 [ 2430.650795] EA = 0, S1PTW = 0 [ 2430.651032] FSC = 0x04: level 0 translation fault [ 2430.651446] Data abort info:
[ 2430.651683] ISV = 0, ISS = 0x00000004 [ 2430.652001] CM = 0, WnR = 0 [ 2430.652558] [0069696969696948] address between user and kernel address ranges [ 2430.653265] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2430.654512] CPU: 2 PID: 20919 Comm: cat Not tainted 5.15.25-g512f31242bf6 #33 [ 2430.655008] Hardware name: linux,dummy-virt (DT) [ 2430.655517] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2430.656142] pc : kfree+0x78/0x348 [ 2430.656630] lr : jffs2_free_inode+0x24/0x48 [ 2430.657051] sp : ffff800009eebd10 [ 2430.657355] x29: ffff800009eebd10 x28: 0000000000000001 x27: 0000000000000000 [ 2430.658327] x26: ffff000038f09d80 x25: 0080000000000000 x24: ffff800009d38000 [ 2430.658919] x23: 5a5a5a5a5a5a5a5a x22: ffff000038f09d80 x21: ffff8000084f0d14 [ 2430.659434] x20: ffff0000bf9a6ac0 x19: 0169696969696940 x18: 0000000000000000 [ 2430.659969] x17: ffff8000b6506000 x16: ffff800009eec000 x15: 0000000000004000 [ 2430.660637] x14: 0000000000000000 x13: 00000001000820a1 x12: 00000000000d1b19 [ 2430.661345] x11: 0004000800000000 x10: 0000000000000001 x9 : ffff8000084f0d14 [ 2430.662025] x8 : ffff0000bf9a6b40 x7 : ffff0000bf9a6b48 x6 : 0000000003470302 [ 2430.662695] x5 : ffff00002e41dcc0 x4 : ffff0000bf9aa3b0 x3 : 0000000003470342 [ 2430.663486] x2 : 0000000000000000 x1 : ffff8000084f0d14 x0 : fffffc0000000000 [ 2430.664217] Call trace:
[ 2430.664528] kfree+0x78/0x348 [ 2430.664855] jffs2_free_inode+0x24/0x48 [ 2430.665233] i_callback+0x24/0x50 [ 2430.665528] rcu_do_batch+0x1ac/0x448 [ 2430.665892] rcu_core+0x28c/0x3c8 [ 2430.666151] rcu_core_si+0x18/0x28 [ 2430.666473] __do_softirq+0x138/0x3cc [ 2430.666781] irq_exit+0xf0/0x110 [ 2430.667065] handle_domain_irq+0x6c/0x98 [ 2430.667447] gic_handle_irq+0xac/0xe8 [ 2430.667739] call_on_irq_stack+0x28/0x54 The parameter passed to kfree was 5a5a5a5a, which corresponds to the target field of the jffs_inode_info structure. It was found that all variables in the jffs_inode_info structure were 5a5a5a5a, except for the first member sem. It is suspected that these variables are not initialized because they were set to 5a5a5a5a during memory testing, which is meant to detect uninitialized memory.The sem variable is initialized in the function jffs2_i_init_once, while other members are initialized in the function jffs2_init_inode_info.
The function jffs2_init_inode_info is called after iget_locked, but in the iget_locked function, the destroy_inode process is triggered, which releases the inode and consequently, the target member of the inode is not initialized.In concurrent high pressure scenarios, iget_locked may enter the destroy_inode branch as described in the code.
Since the destroy_inode functionality of jffs2 only releases the target, the fix method is to set target to NULL in jffs2_i_init_once.
CVE-2024-42113:
In the Linux kernel, the following vulnerability has been resolved:
net: txgbe: initialize num_q_vectors for MSI/INTx interrupts
When using MSI/INTx interrupts, wx->num_q_vectors is uninitialized.
Thus there will be kernel panic in wx_alloc_q_vectors() to allocate queue vectors.
CVE-2024-42105:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix inode number range checks
Patch series nilfs2: fix potential issues related to reserved inodes.
This series fixes one use-after-free issue reported by syzbot, caused by nilfs2's internal inode being exposed in the namespace on a corrupted filesystem, and a couple of flaws that cause problems if the starting number of non-reserved inodes written in the on-disk super block is intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, nilfs->ns_first_ino, which gives the first non-reserved inode number, is read from the superblock, but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of reserved inodes such as the root directory or metadata files is set in the super block parameter, the inode number test macros (NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with the inode number as the shift count via the BIT macro, but the result of a shift calculation that exceeds the bit width of an integer is undefined in the C specification, so if ns_first_ino is set to a large value other than the default value NILFS_USER_INO (=11), the macros may potentially malfunction depending on the environment.
Fix these issues by checking the lower bound of nilfs->ns_first_ino and by preventing bit shifts equal to or greater than the NILFS_USER_INO constant in the inode number test macros.
Also, change the type of ns_first_ino from signed integer to unsigned integer to avoid the need for type casting in comparisons such as the lower bound check introduced this time.
CVE-2024-42104:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: add missing check for inode numbers on directory entries
Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(), tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking.
Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis.
CVE-2024-42103:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix adding block group to a reclaim list and the unused list during reclaim
There is a potential parallel list adding for retrying in btrfs_reclaim_bgs_work and adding to the unused list. Since the block group is removed from the reclaim list and it is on a relocation work, it can be added into the unused list in parallel. When that happens, adding it to the reclaim list will corrupt the list head and trigger list corruption like below.
Fix it by taking fs_info->unused_bgs_lock.
[177.504][T2585409] BTRFS error (device nullb1): error relocating ch= unk 2415919104 [177.514][T2585409] list_del corruption. next->prev should be ff1100= 0344b119c0, but was ff11000377e87c70. (next=3Dff110002390cd9c0) [177.529][T2585409] ------------[ cut here ]------------ [177.537][T2585409] kernel BUG at lib/list_debug.c:65! [177.545][T2585409] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [177.555][T2585409] CPU: 9 PID: 2585409 Comm: kworker/u128:2 Tainted: G W 6.10.0-rc5-kts #1 [177.568][T2585409] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2 02/14/2022 [177.579][T2585409] Workqueue: events_unbound btrfs_reclaim_bgs_work[btrfs] [177.589][T2585409] RIP: 0010:__list_del_entry_valid_or_report.cold+0x70/0x72 [177.624][T2585409] RSP: 0018:ff11000377e87a70 EFLAGS: 00010286 [177.633][T2585409] RAX: 000000000000006d RBX: ff11000344b119c0 RCX:0000000000000000 [177.644][T2585409] RDX: 000000000000006d RSI: 0000000000000008 RDI:ffe21c006efd0f40 [177.655][T2585409] RBP: ff110002e0509f78 R08: 0000000000000001 R09:ffe21c006efd0f08 [177.665][T2585409] R10: ff11000377e87847 R11: 0000000000000000 R12:ff110002390cd9c0 [177.676][T2585409] R13: ff11000344b119c0 R14: ff110002e0508000 R15:dffffc0000000000 [177.687][T2585409] FS: 0000000000000000(0000) GS:ff11000fec880000(0000) knlGS:0000000000000000 [177.700][T2585409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [177.709][T2585409] CR2: 00007f06bc7b1978 CR3: 0000001021e86005 CR4:0000000000771ef0 [177.720][T2585409] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000 [177.731][T2585409] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400 [177.742][T2585409] PKRU: 55555554 [177.748][T2585409] Call Trace:
[177.753][T2585409] <TASK> [177.759][T2585409] ? __die_body.cold+0x19/0x27 [177.766][T2585409] ? die+0x2e/0x50 [177.772][T2585409] ? do_trap+0x1ea/0x2d0 [177.779][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.788][T2585409] ? do_error_trap+0xa3/0x160 [177.795][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.805][T2585409] ? handle_invalid_op+0x2c/0x40 [177.812][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.820][T2585409] ? exc_invalid_op+0x2d/0x40 [177.827][T2585409] ? asm_exc_invalid_op+0x1a/0x20 [177.834][T2585409] ? __list_del_entry_valid_or_report.cold+0x70/0x72 [177.843][T2585409] btrfs_delete_unused_bgs+0x3d9/0x14c0 [btrfs]
There is a similar retry_list code in btrfs_delete_unused_bgs(), but it is safe, AFAICS. Since the block group was in the unused list, the used bytes should be 0 when it was added to the unused list. Then, it checks block_group->{used,reserved,pinned} are still 0 under the block_group->lock. So, they should be still eligible for the unused list, not the reclaim list.
The reason it is safe there it's because because we're holding space_info->groups_sem in write mode.
That means no other task can allocate from the block group, so while we are at deleted_unused_bgs() it's not possible for other tasks to allocate and deallocate extents from the block group, so it can't be added to the unused list or the reclaim list by anyone else.
The bug can be reproduced by btrfs/166 after a few rounds. In practice this can be hit when relocation cannot find more chunk space and ends with ENOSPC.
CVE-2024-42098:
In the Linux kernel, the following vulnerability has been resolved:
crypto: ecdh - explicitly zeroize private_key
private_key is overwritten with the key parameter passed in by the caller (if present), or alternatively a newly generated private key.
However, it is possible that the caller provides a key (or the newly generated key) which is shorter than the previous key. In that scenario, some key material from the previous key would not be overwritten. The easiest solution is to explicitly zeroize the entire private_key array first.
Note that this patch slightly changes the behavior of this function:
previously, if the ecc_gen_privkey failed, the old private_key would remain. Now, the private_key is always zeroized. This behavior is consistent with the case where params.key is set and ecc_is_key_valid fails.
CVE-2024-42096:
In the Linux kernel, the following vulnerability has been resolved:
x86: stop playing stack games in profile_pc()
The 'profile_pc()' function is used for timer-based profiling, which isn't really all that relevant any more to begin with, but it also ends up making assumptions based on the stack layout that aren't necessarily valid.
Basically, the code tries to account the time spent in spinlocks to the caller rather than the spinlock, and while I support that as a concept, it's not worth the code complexity or the KASAN warnings when no serious profiling is done using timers anyway these days.
And the code really does depend on stack layout that is only true in the simplest of cases. We've lost the comment at some point (I think when the 32-bit and 64-bit code was unified), but it used to say:
Assume the lock function has either no stack frame or a copy of eflags from PUSHF.
which explains why it just blindly loads a word or two straight off the stack pointer and then takes a minimal look at the values to just check if they might be eflags or the return pc:
Eflags always has bits 22 and up cleared unlike kernel addresses
but that basic stack layout assumption assumes that there isn't any lock debugging etc going on that would complicate the code and cause a stack frame.
It causes KASAN unhappiness reported for years by syzkaller [1] and others [2].
With no real practical reason for this any more, just remove the code.
Just for historical interest, here's some background commits relating to this code from 2006:
0cb91a229364 (i386: Account spinlocks to the caller during profiling for !FP kernels) 31679f38d886 (Simplify profile_pc on x86-64)
and a code unification from 2009:
ef4512882dbe (x86: time_32/64.c unify profile_pc)
but the basics of this thing actually goes back to before the git tree.
CVE-2024-42084:
In the Linux kernel, the following vulnerability has been resolved:
ftruncate: pass a signed offset
The old ftruncate() syscall, using the 32-bit off_t misses a sign extension when called in compat mode on 64-bit architectures. As a result, passing a negative length accidentally succeeds in truncating to file size between 2GiB and 4GiB.
Changing the type of the compat syscall to the signed compat_off_t changes the behavior so it instead returns -EINVAL.
The native entry point, the truncate() syscall and the corresponding loff_t based variants are all correct already and do not suffer from this mistake.
CVE-2024-42079:
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Fix NULL pointer dereference in gfs2_log_flush
In gfs2_jindex_free(), set sdp->sd_jdesc to NULL under the log flush lock to provide exclusion against gfs2_log_flush().
In gfs2_log_flush(), check if sdp->sd_jdesc is non-NULL before dereferencing it. Otherwise, we could run into a NULL pointer dereference when outstanding glock work races with an unmount (glock_work_func -> run_queue -> do_xmote -> inode_go_sync -> gfs2_log_flush).
CVE-2024-42077:
In the Linux kernel, the following vulnerability has been resolved:
ocfs2: fix DIO failure due to insufficient transaction credits
The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents.
Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem.
To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written().
Heming Zhao said:
------ PANIC: Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error
PID: xxx TASK: xxxx CPU: 5 COMMAND: SubmitThread-CA #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba
CVE-2024-42074:
In the Linux kernel, the following vulnerability has been resolved:
ASoC: amd: acp: add a null check for chip_pdev structure
When acp platform device creation is skipped, chip->chip_pdev value will remain NULL. Add NULL check for chip->chip_pdev structure in snd_acp_resume() function to avoid null pointer dereference.
CVE-2024-41083:
In the Linux kernel, the following vulnerability has been resolved:
netfs: Fix netfs_page_mkwrite() to check folio->mapping is valid
Fix netfs_page_mkwrite() to check that folio->mapping is valid once it has taken the folio lock (as filemap_page_mkwrite() does). Without this, generic/247 occasionally oopses with something like the following:
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page
RIP: 0010:trace_event_raw_event_netfs_folio+0x61/0xc0 ...
Call Trace:
<TASK> ? __die_body+0x1a/0x60 ? page_fault_oops+0x6e/0xa0 ? exc_page_fault+0xc2/0xe0 ? asm_exc_page_fault+0x22/0x30 ? trace_event_raw_event_netfs_folio+0x61/0xc0 trace_netfs_folio+0x39/0x40 netfs_page_mkwrite+0x14c/0x1d0 do_page_mkwrite+0x50/0x90 do_pte_missing+0x184/0x200
__handle_mm_fault+0x42d/0x500 handle_mm_fault+0x121/0x1f0 do_user_addr_fault+0x23e/0x3c0 exc_page_fault+0xc2/0xe0 asm_exc_page_fault+0x22/0x30
This is due to the invalidate_inode_pages2_range() issued at the end of the DIO write interfering with the mmap'd writes.
CVE-2024-41078:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: qgroup: fix quota root leak after quota disable failure
If during the quota disable we fail when cleaning the quota tree or when deleting the root from the root tree, we jump to the 'out' label without ever dropping the reference on the quota root, resulting in a leak of the root since fs_info->quota_root is no longer pointing to the root (we have set it to NULL just before those steps).
Fix this by always doing a btrfs_put_root() call under the 'out' label.
This is a problem that exists since qgroups were first added in 2012 by commit bed92eae26cc (Btrfs: qgroup implementation and prototypes), but back then we missed a kfree on the quota root and free_extent_buffer() calls on its root and commit root nodes, since back then roots were not yet reference counted.
CVE-2024-41076:
In the Linux kernel, the following vulnerability has been resolved:
NFSv4: Fix memory leak in nfs4_set_security_label
We leak nfs_fattr and nfs4_label every time we set a security xattr.
CVE-2024-41075:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: add consistency check for copen/cread
This prevents malicious processes from completing random copen/cread requests and crashing the system. Added checks are listed below:
* Generic, copen can only complete open requests, and cread can only complete read requests.
* For copen, ondemand_id must not be 0, because this indicates that the request has not been read by the daemon.
* For cread, the object corresponding to fd and req should be the same.
CVE-2024-41074:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: Set object to close if ondemand_id < 0 in copen
If copen is maliciously called in the user mode, it may delete the request corresponding to the random id. And the request may have not been read yet.
Note that when the object is set to reopen, the open request will be done with the still reopen state in above case. As a result, the request corresponding to this object is always skipped in select_req function, so the read request is never completed and blocks other process.
Fix this issue by simply set object to close if its id < 0 in copen.
CVE-2024-41069:
In the Linux kernel, the following vulnerability has been resolved:
ASoC: topology: Fix references to freed memory
Most users after parsing a topology file, release memory used by it, so having pointer references directly into topology file contents is wrong.
Use devm_kmemdup(), to allocate memory as needed.
CVE-2024-41059:
In the Linux kernel, the following vulnerability has been resolved:
hfsplus: fix uninit-value in copy_name
[syzbot reported] BUG: KMSAN: uninit-value in sized_strscpy+0xc4/0x160 sized_strscpy+0xc4/0x160 copy_name+0x2af/0x320 fs/hfsplus/xattr.c:411 hfsplus_listxattr+0x11e9/0x1a50 fs/hfsplus/xattr.c:750 vfs_listxattr fs/xattr.c:493 [inline] listxattr+0x1f3/0x6b0 fs/xattr.c:840 path_listxattr fs/xattr.c:864 [inline]
__do_sys_listxattr fs/xattr.c:876 [inline]
__se_sys_listxattr fs/xattr.c:873 [inline]
__x64_sys_listxattr+0x16b/0x2f0 fs/xattr.c:873 x64_sys_call+0x2ba0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3877 [inline] slab_alloc_node mm/slub.c:3918 [inline] kmalloc_trace+0x57b/0xbe0 mm/slub.c:4065 kmalloc include/linux/slab.h:628 [inline] hfsplus_listxattr+0x4cc/0x1a50 fs/hfsplus/xattr.c:699 vfs_listxattr fs/xattr.c:493 [inline] listxattr+0x1f3/0x6b0 fs/xattr.c:840 path_listxattr fs/xattr.c:864 [inline]
__do_sys_listxattr fs/xattr.c:876 [inline]
__se_sys_listxattr fs/xattr.c:873 [inline]
__x64_sys_listxattr+0x16b/0x2f0 fs/xattr.c:873 x64_sys_call+0x2ba0/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [Fix] When allocating memory to strbuf, initialize memory to 0.
CVE-2024-41058:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: fix slab-use-after-free in fscache_withdraw_volume()
We got the following issue in our fault injection stress test:
================================================================== BUG: KASAN: slab-use-after-free in fscache_withdraw_volume+0x2e1/0x370 Read of size 4 at addr ffff88810680be08 by task ondemand-04-dae/5798
CPU: 0 PID: 5798 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #565 Call Trace:
kasan_check_range+0xf6/0x1b0 fscache_withdraw_volume+0x2e1/0x370 cachefiles_withdraw_volume+0x31/0x50 cachefiles_withdraw_cache+0x3ad/0x900 cachefiles_put_unbind_pincount+0x1f6/0x250 cachefiles_daemon_release+0x13b/0x290
__fput+0x204/0xa00 task_work_run+0x139/0x230
Allocated by task 5820:
__kmalloc+0x1df/0x4b0 fscache_alloc_volume+0x70/0x600
__fscache_acquire_volume+0x1c/0x610 erofs_fscache_register_volume+0x96/0x1a0 erofs_fscache_register_fs+0x49a/0x690 erofs_fc_fill_super+0x6c0/0xcc0 vfs_get_super+0xa9/0x140 vfs_get_tree+0x8e/0x300 do_new_mount+0x28c/0x580 [...]
Freed by task 5820:
kfree+0xf1/0x2c0 fscache_put_volume.part.0+0x5cb/0x9e0 erofs_fscache_unregister_fs+0x157/0x1b0 erofs_kill_sb+0xd9/0x1c0 deactivate_locked_super+0xa3/0x100 vfs_get_super+0x105/0x140 vfs_get_tree+0x8e/0x300 do_new_mount+0x28c/0x580 [...] ==================================================================
Following is the process that triggers the issue:
mount failed | daemon exit
------------------------------------------------------------ deactivate_locked_super cachefiles_daemon_release erofs_kill_sb erofs_fscache_unregister_fs fscache_relinquish_volume
__fscache_relinquish_volume fscache_put_volume(fscache_volume, fscache_volume_put_relinquish) zero = __refcount_dec_and_test(&fscache_volume->ref, &ref);
cachefiles_put_unbind_pincount cachefiles_daemon_unbind cachefiles_withdraw_cache cachefiles_withdraw_volumes list_del_init(&volume->cache_link) fscache_free_volume(fscache_volume) cache->ops->free_volume cachefiles_free_volume list_del_init(&cachefiles_volume->cache_link);
kfree(fscache_volume) cachefiles_withdraw_volume fscache_withdraw_volume fscache_volume->n_accesses // fscache_volume UAF !!!
The fscache_volume in cache->volumes must not have been freed yet, but its reference count may be 0. So use the new fscache_try_get_volume() helper function try to get its reference count.
If the reference count of fscache_volume is 0, fscache_put_volume() is freeing it, so wait for it to be removed from cache->volumes.
If its reference count is not 0, call cachefiles_withdraw_volume() with reference count protection to avoid the above issue.
CVE-2024-41057:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()
We got the following issue in our fault injection stress test:
================================================================== BUG: KASAN: slab-use-after-free in cachefiles_withdraw_cookie+0x4d9/0x600 Read of size 8 at addr ffff888118efc000 by task kworker/u78:0/109
CPU: 13 PID: 109 Comm: kworker/u78:0 Not tainted 6.8.0-dirty #566 Call Trace:
<TASK> kasan_report+0x93/0xc0 cachefiles_withdraw_cookie+0x4d9/0x600 fscache_cookie_state_machine+0x5c8/0x1230 fscache_cookie_worker+0x91/0x1c0 process_one_work+0x7fa/0x1800 [...]
Allocated by task 117:
kmalloc_trace+0x1b3/0x3c0 cachefiles_acquire_volume+0xf3/0x9c0 fscache_create_volume_work+0x97/0x150 process_one_work+0x7fa/0x1800 [...]
Freed by task 120301:
kfree+0xf1/0x2c0 cachefiles_withdraw_cache+0x3fa/0x920 cachefiles_put_unbind_pincount+0x1f6/0x250 cachefiles_daemon_release+0x13b/0x290
__fput+0x204/0xa00 task_work_run+0x139/0x230 do_exit+0x87a/0x29b0 [...] ==================================================================
Following is the process that triggers the issue:
p1 | p2
------------------------------------------------------------ fscache_begin_lookup fscache_begin_volume_access fscache_cache_is_live(fscache_cache) cachefiles_daemon_release cachefiles_put_unbind_pincount cachefiles_daemon_unbind cachefiles_withdraw_cache fscache_withdraw_cache fscache_set_cache_state(cache, FSCACHE_CACHE_IS_WITHDRAWN);
cachefiles_withdraw_objects(cache) fscache_wait_for_objects(fscache) atomic_read(&fscache_cache->object_count) == 0 fscache_perform_lookup cachefiles_lookup_cookie cachefiles_alloc_object refcount_set(&object->ref, 1);
object->volume = volume fscache_count_object(vcookie->cache);
atomic_inc(&fscache_cache->object_count) cachefiles_withdraw_volumes cachefiles_withdraw_volume fscache_withdraw_volume
__cachefiles_free_volume kfree(cachefiles_volume) fscache_cookie_state_machine cachefiles_withdraw_cookie cache = object->volume->cache;
// cachefiles_volume UAF !!!
After setting FSCACHE_CACHE_IS_WITHDRAWN, wait for all the cookie lookups to complete first, and then wait for fscache_cache->object_count == 0 to avoid the cookie exiting after the volume has been freed and triggering the above issue. Therefore call fscache_withdraw_volume() before calling cachefiles_withdraw_objects().
This way, after setting FSCACHE_CACHE_IS_WITHDRAWN, only the following two cases will occur:
1) fscache_begin_lookup fails in fscache_begin_volume_access().
2) fscache_withdraw_volume() will ensure that fscache_count_object() has been executed before calling fscache_wait_for_objects().
CVE-2024-41055:
In the Linux kernel, the following vulnerability has been resolved:
mm: prevent derefencing NULL ptr in pfn_section_valid()
Commit 5ec8e8ea8b77 (mm/sparsemem: fix race in accessing memory_section->usage) changed pfn_section_valid() to add a READ_ONCE() call around ms->usage to fix a race with section_deactivate() where ms->usage can be cleared. The READ_ONCE() call, by itself, is not enough to prevent NULL pointer dereference. We need to check its value before dereferencing it.
CVE-2024-41051:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: wait for ondemand_object_worker to finish when dropping object
When queuing ondemand_object_worker() to re-open the object, cachefiles_object is not pinned. The cachefiles_object may be freed when the pending read request is completed intentionally and the related erofs is umounted. If ondemand_object_worker() runs after the object is freed, it will incur use-after-free problem as shown below.
process A processs B process C process D
cachefiles_ondemand_send_req() // send a read req X // wait for its completion
// close ondemand fd cachefiles_ondemand_fd_release() // set object as CLOSE
cachefiles_ondemand_daemon_read() // set object as REOPENING queue_work(fscache_wq, &info->ondemand_work)
// close /dev/cachefiles cachefiles_daemon_release cachefiles_flush_reqs complete(&req->done)
// read req X is completed // umount the erofs fs cachefiles_put_object() // object will be freed cachefiles_ondemand_deinit_obj_info() kmem_cache_free(object) // both info and object are freed ondemand_object_worker()
When dropping an object, it is no longer necessary to reopen the object, so use cancel_work_sync() to cancel or wait for ondemand_object_worker() to finish.
CVE-2024-41031:
In the Linux kernel, the following vulnerability has been resolved:
mm/filemap: skip to create PMD-sized page cache if needed
On ARM64, HPAGE_PMD_ORDER is 13 when the base page size is 64KB. The PMD-sized page cache can't be supported by xarray as the following error messages indicate.
------------[ cut here ]------------ WARNING: CPU: 35 PID: 7484 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128 Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \ nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \ nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \ ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm \ fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 \ sha1_ce virtio_net net_failover virtio_console virtio_blk failover \ dimlib virtio_mmio CPU: 35 PID: 7484 Comm: test Kdump: loaded Tainted: G W 6.10.0-rc5-gavin+ #9 Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024 pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : xas_split_alloc+0xf8/0x128 lr : split_huge_page_to_list_to_order+0x1c4/0x720 sp : ffff800087a4f6c0 x29: ffff800087a4f6c0 x28: ffff800087a4f720 x27: 000000001fffffff x26: 0000000000000c40 x25: 000000000000000d x24: ffff00010625b858 x23: ffff800087a4f720 x22: ffffffdfc0780000 x21: 0000000000000000 x20: 0000000000000000 x19: ffffffdfc0780000 x18: 000000001ff40000 x17: 00000000ffffffff x16: 0000018000000000 x15: 51ec004000000000 x14: 0000e00000000000 x13: 0000000000002000 x12: 0000000000000020 x11: 51ec000000000000 x10: 51ece1c0ffff8000 x9 : ffffbeb961a44d28 x8 : 0000000000000003 x7 : ffffffdfc0456420 x6 : ffff0000e1aa6eb8 x5 : 20bf08b4fe778fca x4 : ffffffdfc0456420 x3 : 0000000000000c40 x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000 Call trace:
xas_split_alloc+0xf8/0x128 split_huge_page_to_list_to_order+0x1c4/0x720 truncate_inode_partial_folio+0xdc/0x160 truncate_inode_pages_range+0x1b4/0x4a8 truncate_pagecache_range+0x84/0xa0 xfs_flush_unmap_range+0x70/0x90 [xfs] xfs_file_fallocate+0xfc/0x4d8 [xfs] vfs_fallocate+0x124/0x2e8 ksys_fallocate+0x4c/0xa0
__arm64_sys_fallocate+0x24/0x38 invoke_syscall.constprop.0+0x7c/0xd8 do_el0_svc+0xb4/0xd0 el0_svc+0x44/0x1d8 el0t_64_sync_handler+0x134/0x150 el0t_64_sync+0x17c/0x180
Fix it by skipping to allocate PMD-sized page cache when its size is larger than MAX_PAGECACHE_ORDER. For this specific case, we will fall to regular path where the readahead window is determined by BDI's sysfs file (read_ahead_kb).
CVE-2024-41002:
In the Linux kernel, the following vulnerability has been resolved:
crypto: hisilicon/sec - Fix memory leak for sec resource release
The AIV is one of the SEC resources. When releasing resources, it need to release the AIV resources at the same time.
Otherwise, memory leakage occurs.
The aiv resource release is added to the sec resource release function.
CVE-2024-40964:
In the Linux kernel, the following vulnerability has been resolved:
ALSA: hda: cs35l41: Possible null pointer dereference in cs35l41_hda_unbind()
The cs35l41_hda_unbind() function clears the hda_component entry matching it's index and then dereferences the codec pointer held in the first element of the hda_component array, this is an issue when the device index was 0.
Instead use the codec pointer stashed in the cs35l41_hda structure as it will still be valid.
CVE-2024-40953:
In the Linux kernel, the following vulnerability has been resolved:
KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin()
Use {READ,WRITE}_ONCE() to access kvm->last_boosted_vcpu to ensure the loads and stores are atomic. In the extremely unlikely scenario the compiler tears the stores, it's theoretically possible for KVM to attempt to get a vCPU using an out-of-bounds index, e.g. if the write is split into multiple 8-bit stores, and is paired with a 32-bit load on a VM with 257 vCPUs:
CPU0 CPU1 last_boosted_vcpu = 0xff;
(last_boosted_vcpu = 0x100) last_boosted_vcpu[15:8] = 0x01;
i = (last_boosted_vcpu = 0x1ff) last_boosted_vcpu[7:0] = 0x00;
vcpu = kvm->vcpu_array[0x1ff];
As detected by KCSAN:
BUG: KCSAN: data-race in kvm_vcpu_on_spin [kvm] / kvm_vcpu_on_spin [kvm]
write to 0xffffc90025a92344 of 4 bytes by task 4340 on cpu 16:
kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4112) kvm handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:? arch/x86/kvm/vmx/vmx.c:6606) kvm_intel vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
__se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
__x64_sys_ioctl (fs/ioctl.c:890) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
read to 0xffffc90025a92344 of 4 bytes by task 4342 on cpu 4:
kvm_vcpu_on_spin (arch/x86/kvm/../../../virt/kvm/kvm_main.c:4069) kvm handle_pause (arch/x86/kvm/vmx/vmx.c:5929) kvm_intel vmx_handle_exit (arch/x86/kvm/vmx/vmx.c:? arch/x86/kvm/vmx/vmx.c:6606) kvm_intel vcpu_run (arch/x86/kvm/x86.c:11107 arch/x86/kvm/x86.c:11211) kvm kvm_arch_vcpu_ioctl_run (arch/x86/kvm/x86.c:?) kvm kvm_vcpu_ioctl (arch/x86/kvm/../../../virt/kvm/kvm_main.c:?) kvm
__se_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:904 fs/ioctl.c:890)
__x64_sys_ioctl (fs/ioctl.c:890) x64_sys_call (arch/x86/entry/syscall_64.c:33) do_syscall_64 (arch/x86/entry/common.c:?) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
value changed: 0x00000012 -> 0x00000000
CVE-2024-40945:
In the Linux kernel, the following vulnerability has been resolved:
iommu: Return right value in iommu_sva_bind_device()
iommu_sva_bind_device() should return either a sva bond handle or an ERR_PTR value in error cases. Existing drivers (idxd and uacce) only check the return value with IS_ERR(). This could potentially lead to a kernel NULL pointer dereference issue if the function returns NULL instead of an error pointer.
In reality, this doesn't cause any problems because iommu_sva_bind_device() only returns NULL when the kernel is not configured with CONFIG_IOMMU_SVA.
In this case, iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) will return an error, and the device drivers won't call iommu_sva_bind_device() at all.
CVE-2024-40938:
In the Linux kernel, the following vulnerability has been resolved:
landlock: Fix d_parent walk
The WARN_ON_ONCE() in collect_domain_accesses() can be triggered when trying to link a root mount point. This cannot work in practice because this directory is mounted, but the VFS check is done after the call to security_path_link().
Do not use source directory's d_parent when the source directory is the mount point.
[mic: Fix commit message]
CVE-2024-40935:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: flush all requests after setting CACHEFILES_DEAD
In ondemand mode, when the daemon is processing an open request, if the kernel flags the cache as CACHEFILES_DEAD, the cachefiles_daemon_write() will always return -EIO, so the daemon can't pass the copen to the kernel.
Then the kernel process that is waiting for the copen triggers a hung_task.
Since the DEAD state is irreversible, it can only be exited by closing /dev/cachefiles. Therefore, after calling cachefiles_io_error() to mark the cache as CACHEFILES_DEAD, if in ondemand mode, flush all requests to avoid the above hungtask. We may still be able to read some of the cached data before closing the fd of /dev/cachefiles.
Note that this relies on the patch that adds reference counting to the req, otherwise it may UAF.
CVE-2024-40914:
In the Linux kernel, the following vulnerability has been resolved:
mm/huge_memory: don't unpoison huge_zero_folio
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 Call Trace:
<TASK> do_shrink_slab+0x14f/0x6a0 shrink_slab+0xca/0x8c0 shrink_node+0x2d0/0x7d0 balance_pgdat+0x33a/0x720 kswapd+0x1f3/0x410 kthread+0xd5/0x100 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]--- RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_folio without increasing the folio refcnt. But then unpoison_memory() will decrease the folio refcnt unexpectedly as it appears like a successfully hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet.
CVE-2024-40913:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
After installing the anonymous fd, we can now see it in userland and close it. However, at this point we may not have gotten the reference count of the cache, but we will put it during colse fd, so this may cause a cache UAF.
So grab the cache reference count before fd_install(). In addition, by kernel convention, fd is taken over by the user land after fd_install(), and the kernel should not call close_fd() after that, i.e., it should call fd_install() after everything is ready, thus fd_install() is called after copy_to_user() succeeds.
CVE-2024-40902:
In the Linux kernel, the following vulnerability has been resolved:
jfs: xattr: fix buffer overflow for invalid xattr
When an xattr size is not what is expected, it is printed out to the kernel log in hex format as a form of debugging. But when that xattr size is bigger than the expected size, printing it out can cause an access off the end of the buffer.
Fix this all up by properly restricting the size of the debug hex dump in the kernel log.
CVE-2024-40900:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: remove requests from xarray during flushing requests
Even with CACHEFILES_DEAD set, we can still read the requests, so in the following concurrency the request may be used after it has been freed:
mount | daemon_thread1 | daemon_thread2
------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done) cachefiles_daemon_read cachefiles_ondemand_daemon_read // close dev fd cachefiles_flush_reqs complete(&REQ_A->done) kfree(REQ_A) xa_lock(&cache->reqs);
cachefiles_ondemand_select_req req->msg.opcode != CACHEFILES_OP_READ // req use-after-free !!! xa_unlock(&cache->reqs);
xa_destroy(&cache->reqs)
Hence remove requests from cache->reqs when flushing them to avoid accessing freed requests.
CVE-2024-40899:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
We got the following issue in a fuzz test of randomly issuing the restore command:
================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0 Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542 Call Trace:
kasan_report+0x94/0xc0 cachefiles_ondemand_daemon_read+0x609/0xab0 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0
Allocated by task 626:
__kmalloc+0x1df/0x4b0 cachefiles_ondemand_send_req+0x24d/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...]
Freed by task 626:
kfree+0xf1/0x2c0 cachefiles_ondemand_send_req+0x568/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...] ==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2
------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd copy_to_user(_buffer, msg, n) process_open_req(REQ_A)
------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW);
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req
write(devfd, (copen %u,%llu, msg->msg_id, size));
cachefiles_ondemand_copen xa_erase(&cache->reqs, id) complete(&REQ_A->done) kfree(REQ_A) cachefiles_ondemand_get_fd(REQ_A) fd = get_unused_fd_flags file = anon_inode_getfile fd_install(fd, file) load = (void *)REQ_A->msg.data;
load->fd = fd;
// load UAF !!!
This issue is caused by issuing a restore command when the daemon is still alive, which results in a request being processed multiple times thus triggering a UAF. So to avoid this problem, add an additional reference count to cachefiles_req, which is held while waiting and reading, and then released when the waiting and reading is over.
Note that since there is only one reference count for waiting, we need to avoid the same request being completed multiple times, so we can only complete the request if it is successfully removed from the xarray.
CVE-2024-39510:
In the Linux kernel, the following vulnerability has been resolved:
cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
We got the following issue in a fuzz test of randomly issuing the restore command:
================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60 Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963
CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564 Call Trace:
kasan_report+0x93/0xc0 cachefiles_ondemand_daemon_read+0xb41/0xb60 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0
Allocated by task 116:
kmem_cache_alloc+0x140/0x3a0 cachefiles_lookup_cookie+0x140/0xcd0 fscache_cookie_state_machine+0x43c/0x1230 [...]
Freed by task 792:
kmem_cache_free+0xfe/0x390 cachefiles_put_object+0x241/0x480 fscache_cookie_state_machine+0x5c8/0x1230 [...] ==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2
------------------------------------------------------------ cachefiles_withdraw_cookie cachefiles_ondemand_clean_object(object) cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req msg->object_id = req->object->ondemand->ondemand_id
------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req copy_to_user(_buffer, msg, n) xa_erase(&cache->reqs, id) complete(&REQ_A->done)
------ close(fd) ------ cachefiles_ondemand_fd_release cachefiles_put_object cachefiles_put_object kmem_cache_free(cachefiles_object_jar, object) REQ_A->object->ondemand->ondemand_id // object UAF !!!
When we see the request within xa_lock, req->object must not have been freed yet, so grab the reference count of object before xa_unlock to avoid the above issue.
CVE-2024-39491:
In the Linux kernel, the following vulnerability has been resolved:
ALSA: hda: cs35l56: Fix lifetime of cs_dsp instance
The cs_dsp instance is initialized in the driver probe() so it should be freed in the driver remove(). Also fix a missing call to cs_dsp_remove() in the error path of cs35l56_hda_common_probe().
The call to cs_dsp_remove() was being done in the component unbind callback cs35l56_hda_unbind(). This meant that if the driver was unbound and then re-bound it would be using an uninitialized cs_dsp instance.
It is best to initialize the cs_dsp instance in probe() so that it can return an error if it fails. The component binding API doesn't have any error handling so there's no way to handle a failure if cs_dsp was initialized in the bind.
CVE-2024-39483:
In the Linux kernel, the following vulnerability has been resolved:
KVM: SVM: WARN on vNMI + NMI window iff NMIs are outright masked
When requesting an NMI window, WARN on vNMI support being enabled if and only if NMIs are actually masked, i.e. if the vCPU is already handling an NMI. KVM's ABI for NMIs that arrive simultanesouly (from KVM's point of view) is to inject one NMI and pend the other. When using vNMI, KVM pends the second NMI simply by setting V_NMI_PENDING, and lets the CPU do the rest (hardware automatically sets V_NMI_BLOCKING when an NMI is injected).
However, if KVM can't immediately inject an NMI, e.g. because the vCPU is in an STI shadow or is running with GIF=0, then KVM will request an NMI window and trigger the WARN (but still function correctly).
Whether or not the GIF=0 case makes sense is debatable, as the intent of KVM's behavior is to provide functionality that is as close to real hardware as possible. E.g. if two NMIs are sent in quick succession, the probability of both NMIs arriving in an STI shadow is infinitesimally low on real hardware, but significantly larger in a virtual environment, e.g.
if the vCPU is preempted in the STI shadow. For GIF=0, the argument isn't as clear cut, because the window where two NMIs can collide is much larger in bare metal (though still small).
That said, KVM should not have divergent behavior for the GIF=0 case based on whether or not vNMI support is enabled. And KVM has allowed simultaneous NMIs with GIF=0 for over a decade, since commit 7460fb4a3400 (KVM: Fix simultaneous NMIs). I.e. KVM's GIF=0 handling shouldn't be modified without a *really* good reason to do so, and if KVM's behavior were to be modified, it should be done irrespective of vNMI support.
CVE-2024-39470:
In the Linux kernel, the following vulnerability has been resolved:
eventfs: Fix a possible null pointer dereference in eventfs_find_events()
In function eventfs_find_events,there is a potential null pointer that may be caused by calling update_events_attr which will perform some operations on the members of the ei struct when ei is NULL.
Hence,When ei->is_freed is set,return NULL directly.
CVE-2024-39469:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors
The error handling in nilfs_empty_dir() when a directory folio/page read fails is incorrect, as in the old ext2 implementation, and if the folio/page cannot be read or nilfs_check_folio() fails, it will falsely determine the directory as empty and corrupt the file system.
In addition, since nilfs_empty_dir() does not immediately return on a failed folio/page read, but continues to loop, this can cause a long loop with I/O if i_size of the directory's inode is also corrupted, causing the log writer thread to wait and hang, as reported by syzbot.
Fix these issues by making nilfs_empty_dir() immediately return a false value (0) if it fails to get a directory folio/page.
CVE-2024-39468:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix deadlock in smb2_find_smb_tcon()
Unlock cifs_tcp_ses_lock before calling cifs_put_smb_ses() to avoid such deadlock.
CVE-2024-39467:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
syzbot reports a kernel bug as below:
F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 ================================================================== BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr ffff88807a58c76c by task syz-executor280/5076
CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838
__do_sys_ioctl fs/ioctl.c:902 [inline]
__se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
The root cause is we missed to do sanity check on i_xattr_nid during f2fs_iget(), so that in fiemap() path, current_nat_addr() will access nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering kasan bug report, fix it.
CVE-2024-39463:
In the Linux kernel, the following vulnerability has been resolved:
9p: add missing locking around taking dentry fid list
Fix a use-after-free on dentry's d_fsdata fid list when a thread looks up a fid through dentry while another thread unlinks it:
UAF thread:
refcount_t: addition on 0; use-after-free.
p9_fid_get linux/./include/net/9p/client.h:262 v9fs_fid_find+0x236/0x280 linux/fs/9p/fid.c:129 v9fs_fid_lookup_with_uid linux/fs/9p/fid.c:181 v9fs_fid_lookup+0xbf/0xc20 linux/fs/9p/fid.c:314 v9fs_vfs_getattr_dotl+0xf9/0x360 linux/fs/9p/vfs_inode_dotl.c:400 vfs_statx+0xdd/0x4d0 linux/fs/stat.c:248
Freed by:
p9_fid_destroy (inlined) p9_client_clunk+0xb0/0xe0 linux/net/9p/client.c:1456 p9_fid_put linux/./include/net/9p/client.h:278 v9fs_dentry_release+0xb5/0x140 linux/fs/9p/vfs_dentry.c:55 v9fs_remove+0x38f/0x620 linux/fs/9p/vfs_inode.c:518 vfs_unlink+0x29a/0x810 linux/fs/namei.c:4335
The problem is that d_fsdata was not accessed under d_lock, because d_release() normally is only called once the dentry is otherwise no longer accessible but since we also call it explicitly in v9fs_remove that lock is required:
move the hlist out of the dentry under lock then unref its fids once they are no longer accessible.
CVE-2024-39276:
In the Linux kernel, the following vulnerability has been resolved:
ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find()
Syzbot reports a warning as follows:
============================================ WARNING: CPU: 0 PID: 5075 at fs/mbcache.c:419 mb_cache_destroy+0x224/0x290 Modules linked in:
CPU: 0 PID: 5075 Comm: syz-executor199 Not tainted 6.9.0-rc6-gb947cc5bf6d7 RIP: 0010:mb_cache_destroy+0x224/0x290 fs/mbcache.c:419 Call Trace:
<TASK> ext4_put_super+0x6d4/0xcd0 fs/ext4/super.c:1375 generic_shutdown_super+0x136/0x2d0 fs/super.c:641 kill_block_super+0x44/0x90 fs/super.c:1675 ext4_kill_sb+0x68/0xa0 fs/ext4/super.c:7327 [...] ============================================
This is because when finding an entry in ext4_xattr_block_cache_find(), if ext4_sb_bread() returns -ENOMEM, the ce's e_refcnt, which has already grown in the __entry_find(), won't be put away, and eventually trigger the above issue in mb_cache_destroy() due to reference count leakage.
So call mb_cache_entry_put() on the -ENOMEM error branch as a quick fix.
CVE-2024-38599:
In the Linux kernel, the following vulnerability has been resolved:
jffs2: prevent xattr node from overflowing the eraseblock
Add a check to make sure that the requested xattr node size is no larger than the eraseblock minus the cleanmarker.
Unlike the usual inode nodes, the xattr nodes aren't split into parts and spread across multiple eraseblocks, which means that a xattr node must not occupy more than one eraseblock. If the requested xattr value is too large, the xattr node can spill onto the next eraseblock, overwriting the nodes and causing errors such as:
jffs2: argh. node added in wrong place at 0x0000b050(2) jffs2: nextblock 0x0000a000, expected at 0000b00c jffs2: error: (823) do_verify_xattr_datum: node CRC failed at 0x01e050, read=0xfc892c93, calc=0x000000 jffs2: notice: (823) jffs2_get_inode_nodes: Node header CRC failed at 0x01e00c. {848f,2fc4,0fef511f,59a3d171} jffs2: Node at 0x0000000c with length 0x00001044 would run over the end of the erase block jffs2: Perhaps the file system was created with the wrong erase size? jffs2: jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x00000010: 0x1044 instead
This breaks the filesystem and can lead to KASAN crashes such as:
BUG: KASAN: slab-out-of-bounds in jffs2_sum_add_kvec+0x125e/0x15d0 Read of size 4 at addr ffff88802c31e914 by task repro/830 CPU: 0 PID: 830 Comm: repro Not tainted 6.9.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0xc6/0x120 print_report+0xc4/0x620 ? __virt_addr_valid+0x308/0x5b0 kasan_report+0xc1/0xf0 ? jffs2_sum_add_kvec+0x125e/0x15d0 ? jffs2_sum_add_kvec+0x125e/0x15d0 jffs2_sum_add_kvec+0x125e/0x15d0 jffs2_flash_direct_writev+0xa8/0xd0 jffs2_flash_writev+0x9c9/0xef0 ? __x64_sys_setxattr+0xc4/0x160 ? do_syscall_64+0x69/0x140 ? entry_SYSCALL_64_after_hwframe+0x76/0x7e [...]
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
CVE-2024-38583:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix use-after-free of timer for log writer thread
Patch series nilfs2: fix log writer related issues.
This bug fix series covers three nilfs2 log writer-related issues, including a timer use-after-free issue and potential deadlock issue on unmount, and a potential freeze issue in event synchronization found during their analysis. Details are described in each commit log.
This patch (of 3):
A use-after-free issue has been reported regarding the timer sc_timer on the nilfs_sc_info structure.
The problem is that even though it is used to wake up a sleeping log writer thread, sc_timer is not shut down until the nilfs_sc_info structure is about to be freed, and is used regardless of the thread's lifetime.
Fix this issue by limiting the use of sc_timer only while the log writer thread is alive.
CVE-2024-38582:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix potential hang in nilfs_detach_log_writer()
Syzbot has reported a potential hang in nilfs_detach_log_writer() called during nilfs2 unmount.
Analysis revealed that this is because nilfs_segctor_sync(), which synchronizes with the log writer thread, can be called after nilfs_segctor_destroy() terminates that thread, as shown in the call trace below:
nilfs_detach_log_writer nilfs_segctor_destroy nilfs_segctor_kill_thread --> Shut down log writer thread flush_work nilfs_iput_work_func nilfs_dispose_list iput nilfs_evict_inode nilfs_transaction_commit nilfs_construct_segment (if inode needs sync) nilfs_segctor_sync --> Attempt to synchronize with log writer thread
*** DEADLOCK ***
Fix this issue by changing nilfs_segctor_sync() so that the log writer thread returns normally without synchronizing after it terminates, and by forcing tasks that are already waiting to complete once after the thread terminates.
The skipped inode metadata flushout will then be processed together in the subsequent cleanup work in nilfs_segctor_destroy().
CVE-2024-38580:
In the Linux kernel, the following vulnerability has been resolved:
epoll: be better about file lifetimes
epoll can call out to vfs_poll() with a file pointer that may race with the last 'fput()'. That would make f_count go down to zero, and while the ep->mtx locking means that the resulting file pointer tear-down will be blocked until the poll returns, it means that f_count is already dead, and any use of it won't actually get a reference to the file any more: it's dead regardless.
Make sure we have a valid ref on the file pointer before we call down to vfs_poll() from the epoll routines.
CVE-2024-38578:
In the Linux kernel, the following vulnerability has been resolved:
ecryptfs: Fix buffer size for tag 66 packet
The 'TAG 66 Packet Format' description is missing the cipher code and checksum fields that are packed into the message packet. As a result, the buffer allocated for the packet is 3 bytes too small and write_tag_66_packet() will write up to 3 bytes past the end of the buffer.
Fix this by increasing the size of the allocation so the whole packet will always fit in the buffer.
This fixes the below kasan slab-out-of-bounds bug:
BUG: KASAN: slab-out-of-bounds in ecryptfs_generate_key_packet_set+0x7d6/0xde0 Write of size 1 at addr ffff88800afbb2a5 by task touch/181
CPU: 0 PID: 181 Comm: touch Not tainted 6.6.13-gnu #1 4c9534092be820851bb687b82d1f92a426598dc6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2/GNU Guix 04/01/2014 Call Trace:
<TASK> dump_stack_lvl+0x4c/0x70 print_report+0xc5/0x610 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 ? kasan_complete_mode_report_info+0x44/0x210 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0 kasan_report+0xc2/0x110 ? ecryptfs_generate_key_packet_set+0x7d6/0xde0
__asan_store1+0x62/0x80 ecryptfs_generate_key_packet_set+0x7d6/0xde0 ? __pfx_ecryptfs_generate_key_packet_set+0x10/0x10 ? __alloc_pages+0x2e2/0x540 ? __pfx_ovl_open+0x10/0x10 [overlay 30837f11141636a8e1793533a02e6e2e885dad1d] ? dentry_open+0x8f/0xd0 ecryptfs_write_metadata+0x30a/0x550 ? __pfx_ecryptfs_write_metadata+0x10/0x10 ? ecryptfs_get_lower_file+0x6b/0x190 ecryptfs_initialize_file+0x77/0x150 ecryptfs_create+0x1c2/0x2f0 path_openat+0x17cf/0x1ba0 ? __pfx_path_openat+0x10/0x10 do_filp_open+0x15e/0x290 ? __pfx_do_filp_open+0x10/0x10 ? __kasan_check_write+0x18/0x30 ? _raw_spin_lock+0x86/0xf0 ? __pfx__raw_spin_lock+0x10/0x10 ? __kasan_check_write+0x18/0x30 ? alloc_fd+0xf4/0x330 do_sys_openat2+0x122/0x160 ? __pfx_do_sys_openat2+0x10/0x10
__x64_sys_openat+0xef/0x170 ? __pfx___x64_sys_openat+0x10/0x10 do_syscall_64+0x60/0xd0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f00a703fd67 Code: 25 00 00 41 00 3d 00 00 41 00 74 37 64 8b 04 25 18 00 00 00 85 c0 75 5b 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 85 00 00 00 48 83 c4 68 5d 41 5c c3 0f 1f RSP: 002b:00007ffc088e30b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 00007ffc088e3368 RCX: 00007f00a703fd67 RDX: 0000000000000941 RSI: 00007ffc088e48d7 RDI: 00000000ffffff9c RBP: 00007ffc088e48d7 R08: 0000000000000001 R09: 0000000000000000 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941 R13: 0000000000000000 R14: 00007ffc088e48d7 R15: 00007f00a7180040 </TASK>
Allocated by task 181:
kasan_save_stack+0x2f/0x60 kasan_set_track+0x29/0x40 kasan_save_alloc_info+0x25/0x40
__kasan_kmalloc+0xc5/0xd0
__kmalloc+0x66/0x160 ecryptfs_generate_key_packet_set+0x6d2/0xde0 ecryptfs_write_metadata+0x30a/0x550 ecryptfs_initialize_file+0x77/0x150 ecryptfs_create+0x1c2/0x2f0 path_openat+0x17cf/0x1ba0 do_filp_open+0x15e/0x290 do_sys_openat2+0x122/0x160
__x64_sys_openat+0xef/0x170 do_syscall_64+0x60/0xd0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
CVE-2024-38570:
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Fix potential glock use-after-free on unmount
When a DLM lockspace is released and there ares still locks in that lockspace, DLM will unlock those locks automatically. Commit fb6791d100d1b started exploiting this behavior to speed up filesystem unmount: gfs2 would simply free glocks it didn't want to unlock and then release the lockspace. This didn't take the bast callbacks for asynchronous lock contention notifications into account, which remain active until until a lock is unlocked or its lockspace is released.
To prevent those callbacks from accessing deallocated objects, put the glocks that should not be unlocked on the sd_dead_glocks list, release the lockspace, and only then free those glocks.
As an additional measure, ignore unexpected ast and bast callbacks if the receiving glock is dead.
CVE-2024-37354:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix crash on racing fsync and size-extending write into prealloc
We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe():
BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192)
------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs]
With the following stack trace:
#0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) #3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) #4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) #5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) #6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) #7 btrfs_sync_file (fs/btrfs/file.c:1933:8) #8 vfs_fsync_range (fs/sync.c:188:9) #9 vfs_fsync (fs/sync.c:202:9) #10 do_fsync (fs/sync.c:212:9) #11 __do_sys_fdatasync (fs/sync.c:225:9) #12 __se_sys_fdatasync (fs/sync.c:223:1) #13 __x64_sys_fdatasync (fs/sync.c:223:1) #14 do_syscall_x64 (arch/x86/entry/common.c:52:14) #15 do_syscall_64 (arch/x86/entry/common.c:83:7) #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121)
So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG().
This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us:
>>> print_extent_buffer(prog.crashed_thread().stack_trace()[0][eb]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ...
So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size.
Here is the state of
---truncated---
CVE-2024-37078:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix potential kernel bug due to lack of writeback flag waiting
Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below):
kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ...
RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ...
Call Trace:
<TASK> nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 </TASK>
This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state.
Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state.
CVE-2024-36966:
In the Linux kernel, the following vulnerability has been resolved:
erofs: reliably distinguish block based and fscache mode
When erofs_kill_sb() is called in block dev based mode, s_bdev may not have been initialised yet, and if CONFIG_EROFS_FS_ONDEMAND is enabled, it will be mistaken for fscache mode, and then attempt to free an anon_dev that has never been allocated, triggering the following warning:
============================================ ida_free called for id=0 which is not allocated.
WARNING: CPU: 14 PID: 926 at lib/idr.c:525 ida_free+0x134/0x140 Modules linked in:
CPU: 14 PID: 926 Comm: mount Not tainted 6.9.0-rc3-dirty #630 RIP: 0010:ida_free+0x134/0x140 Call Trace:
<TASK> erofs_kill_sb+0x81/0x90 deactivate_locked_super+0x35/0x80 get_tree_bdev+0x136/0x1e0 vfs_get_tree+0x2c/0xf0 do_new_mount+0x190/0x2f0 [...] ============================================
Now when erofs_kill_sb() is called, erofs_sb_info must have been initialised, so use sbi->fsid to distinguish between the two modes.
CVE-2024-36964:
In the Linux kernel, the following vulnerability has been resolved:
fs/9p: only translate RWX permissions for plain 9P2000
Garbage in plain 9P2000's perm bits is allowed through, which causes it to be able to set (among others) the suid bit. This was presumably not the intent since the unix extended bits are handled explicitly and conditionally on .u.
CVE-2024-36963:
In the Linux kernel, the following vulnerability has been resolved:
tracefs: Reset permissions on remount if permissions are options
There's an inconsistency with the way permissions are handled in tracefs.
Because the permissions are generated when accessed, they default to the root inode's permission if they were never set by the user. If the user sets the permissions, then a flag is set and the permissions are saved via the inode (for tracefs files) or an internal attribute field (for eventfs).
But if a remount happens that specify the permissions, all the files that were not changed by the user gets updated, but the ones that were are not.
If the user were to remount the file system with a given permission, then all files and directories within that file system should be updated.
This can cause security issues if a file's permission was updated but the admin forgot about it. They could incorrectly think that remounting with permissions set would update all files, but miss some.
For example:
# cd /sys/kernel/tracing # chgrp 1002 current_tracer # ls -l [..]
-rw-r----- 1 root root 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root root 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root root 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root root 0 May 1 21:25 dynamic_events
-r--r----- 1 root root 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root root 0 May 1 21:25 enabled_functions
Where current_tracer now has group lkp.
# mount -o remount,gid=1001 .
# ls -l
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_size_kb
-rw-r----- 1 root tracing 0 May 1 21:25 buffer_subbuf_size_kb
-r--r----- 1 root tracing 0 May 1 21:25 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:25 current_tracer
-rw-r----- 1 root tracing 0 May 1 21:25 dynamic_events
-r--r----- 1 root tracing 0 May 1 21:25 dyn_ftrace_total_info
-r--r----- 1 root tracing 0 May 1 21:25 enabled_functions
Everything changed but the current_tracer.
Add a new link list that keeps track of all the tracefs_inodes which has the permission flags that tell if the file/dir should use the root inode's permission or not. Then on remount, clear all the flags so that the default behavior of using the root inode's permission is done for all files and directories.
CVE-2024-36939:
In the Linux kernel, the following vulnerability has been resolved:
nfs: Handle error of rpc_proc_register() in nfs_net_init().
syzkaller reported a warning [0] triggered while destroying immature netns.
rpc_proc_register() was called in init_nfs_fs(), but its error has been ignored since at least the initial commit 1da177e4c3f4 (Linux-2.6.12-rc2).
Recently, commit d47151b79e32 (nfs: expose /proc/net/sunrpc/nfs in net namespaces) converted the procfs to per-netns and made the problem more visible.
Even when rpc_proc_register() fails, nfs_net_init() could succeed, and thus nfs_net_exit() will be called while destroying the netns.
Then, remove_proc_entry() will be called for non-existing proc directory and trigger the warning below.
Let's handle the error of rpc_proc_register() properly in nfs_net_init().
[0]:
name 'nfs' WARNING: CPU: 1 PID: 1710 at fs/proc/generic.c:711 remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711 Modules linked in:
CPU: 1 PID: 1710 Comm: syz-executor.2 Not tainted 6.8.0-12822-gcd51db110a7e #12 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:remove_proc_entry+0x1bb/0x2d0 fs/proc/generic.c:711 Code: 41 5d 41 5e c3 e8 85 09 b5 ff 48 c7 c7 88 58 64 86 e8 09 0e 71 02 e8 74 09 b5 ff 4c 89 e6 48 c7 c7 de 1b 80 84 e8 c5 ad 97 ff <0f> 0b eb b1 e8 5c 09 b5 ff 48 c7 c7 88 58 64 86 e8 e0 0d 71 02 eb RSP: 0018:ffffc9000c6d7ce0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff8880422b8b00 RCX: ffffffff8110503c RDX: ffff888030652f00 RSI: ffffffff81105045 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff81bb62cb R12: ffffffff84807ffc R13: ffff88804ad6fcc0 R14: ffffffff84807ffc R15: ffffffff85741ff8 FS: 00007f30cfba8640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff51afe8000 CR3: 000000005a60a005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace:
<TASK> rpc_proc_unregister+0x64/0x70 net/sunrpc/stats.c:310 nfs_net_exit+0x1c/0x30 fs/nfs/inode.c:2438 ops_exit_list+0x62/0xb0 net/core/net_namespace.c:170 setup_net+0x46c/0x660 net/core/net_namespace.c:372 copy_net_ns+0x244/0x590 net/core/net_namespace.c:505 create_new_namespaces+0x2ed/0x770 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xae/0x160 kernel/nsproxy.c:228 ksys_unshare+0x342/0x760 kernel/fork.c:3322
__do_sys_unshare kernel/fork.c:3393 [inline]
__se_sys_unshare kernel/fork.c:3391 [inline]
__x64_sys_unshare+0x1f/0x30 kernel/fork.c:3391 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0x7f30d0febe5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f30cfba7cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f30d0febe5d RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000006c020600 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002 R13: 000000000000000b R14: 00007f30d104c530 R15: 0000000000000000 </TASK>
CVE-2024-36884:
In the Linux kernel, the following vulnerability has been resolved:
iommu/arm-smmu: Use the correct type in nvidia_smmu_context_fault()
This was missed because of the function pointer indirection.
nvidia_smmu_context_fault() is also installed as a irq function, and the 'void *' was changed to a struct arm_smmu_domain. Since the iommu_domain is embedded at a non-zero offset this causes nvidia_smmu_context_fault() to miscompute the offset. Fixup the types.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000120 Mem abort info:
ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000107c9f000 [0000000000000120] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in:
CPU: 1 PID: 47 Comm: kworker/u25:0 Not tainted 6.9.0-0.rc7.58.eln136.aarch64 #1 Hardware name: Unknown NVIDIA Jetson Orin NX/NVIDIA Jetson Orin NX, BIOS 3.1-32827747 03/19/2023 Workqueue: events_unbound deferred_probe_work_func pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : nvidia_smmu_context_fault+0x1c/0x158 lr : __free_irq+0x1d4/0x2e8 sp : ffff80008044b6f0 x29: ffff80008044b6f0 x28: ffff000080a60b18 x27: ffffd32b5172e970 x26: 0000000000000000 x25: ffff0000802f5aac x24: ffff0000802f5a30 x23: ffff0000802f5b60 x22: 0000000000000057 x21: 0000000000000000 x20: ffff0000802f5a00 x19: ffff000087d4cd80 x18: ffffffffffffffff x17: 6234362066666666 x16: 6630303078302d30 x15: ffff00008156d888 x14: 0000000000000000 x13: ffff0000801db910 x12: ffff00008156d6d0 x11: 0000000000000003 x10: ffff0000801db918 x9 : ffffd32b50f94d9c x8 : 1fffe0001032fda1 x7 : ffff00008197ed00 x6 : 000000000000000f x5 : 000000000000010e x4 : 000000000000010e x3 : 0000000000000000 x2 : ffffd32b51720cd8 x1 : ffff000087e6f700 x0 : 0000000000000057 Call trace:
nvidia_smmu_context_fault+0x1c/0x158
__free_irq+0x1d4/0x2e8 free_irq+0x3c/0x80 devm_free_irq+0x64/0xa8 arm_smmu_domain_free+0xc4/0x158 iommu_domain_free+0x44/0xa0 iommu_deinit_device+0xd0/0xf8
__iommu_group_remove_device+0xcc/0xe0 iommu_bus_notifier+0x64/0xa8 notifier_call_chain+0x78/0x148 blocking_notifier_call_chain+0x4c/0x90 bus_notify+0x44/0x70 device_del+0x264/0x3e8 pci_remove_bus_device+0x84/0x120 pci_remove_root_bus+0x5c/0xc0 dw_pcie_host_deinit+0x38/0xe0 tegra_pcie_config_rp+0xc0/0x1f0 tegra_pcie_dw_probe+0x34c/0x700 platform_probe+0x70/0xe8 really_probe+0xc8/0x3a0
__driver_probe_device+0x84/0x160 driver_probe_device+0x44/0x130
__device_attach_driver+0xc4/0x170 bus_for_each_drv+0x90/0x100
__device_attach+0xa8/0x1c8 device_initial_probe+0x1c/0x30 bus_probe_device+0xb0/0xc0 deferred_probe_work_func+0xbc/0x120 process_one_work+0x194/0x490 worker_thread+0x284/0x3b0 kthread+0xf4/0x108 ret_from_fork+0x10/0x20 Code: a9b97bfd 910003fd a9025bf5 f85a0035 (b94122a1)
CVE-2024-36478:
In the Linux kernel, the following vulnerability has been resolved:
null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
Writing 'power' and 'submit_queues' concurrently will trigger kernel panic:
Test script:
modprobe null_blk nr_devices=0 mkdir -p /sys/kernel/config/nullb/nullb0 while true; do echo 1 > submit_queues; echo 4 > submit_queues; done & while true; do echo 1 > power; echo 0 > power; done
Test result:
BUG: kernel NULL pointer dereference, address: 0000000000000148 Oops: 0000 [#1] PREEMPT SMP RIP: 0010:__lock_acquire+0x41d/0x28f0 Call Trace:
<TASK> lock_acquire+0x121/0x450 down_write+0x5f/0x1d0 simple_recursive_removal+0x12f/0x5c0 blk_mq_debugfs_unregister_hctxs+0x7c/0x100 blk_mq_update_nr_hw_queues+0x4a3/0x720 nullb_update_nr_hw_queues+0x71/0xf0 [null_blk] nullb_device_submit_queues_store+0x79/0xf0 [null_blk] configfs_write_iter+0x119/0x1e0 vfs_write+0x326/0x730 ksys_write+0x74/0x150
This is because del_gendisk() can concurrent with blk_mq_update_nr_hw_queues():
nullb_device_power_store nullb_apply_submit_queues null_del_dev del_gendisk nullb_update_nr_hw_queues if (!dev->nullb) // still set while gendisk is deleted return 0 blk_mq_update_nr_hw_queues dev->nullb = NULL
Fix this problem by resuing the global mutex to protect nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs.
CVE-2024-35870:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix UAF in smb2_reconnect_server()
The UAF bug is due to smb2_reconnect_server() accessing a session that is already being teared down by another thread that is executing
__cifs_put_smb_ses(). This can happen when (a) the client has connection to the server but no session or (b) another thread ends up setting @ses->ses_status again to something different than SES_EXITING.
To fix this, we need to make sure to unconditionally set @ses->ses_status to SES_EXITING and prevent any other threads from setting a new status while we're still tearing it down.
The following can be reproduced by adding some delay to right after the ipc is freed in __cifs_put_smb_ses() - which will give smb2_reconnect_server() worker a chance to run and then accessing @ses->ipc:
kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10 [disconnect srv] ls /mnt/1 &>/dev/null sleep 30 kdestroy [reconnect srv] sleep 10 umount /mnt/1 ...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed CIFS: VFS: \\srv Send error in SessSetup = -126 CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed CIFS: VFS: \\srv Send error in SessSetup = -126 general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 Workqueue: cifsiod smb2_reconnect_server [cifs] RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0 Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75 7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8 RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83 RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800 RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000 R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000 FS: 0000000000000000(0000) GS:ffff888157c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace:
<TASK> ? die_addr+0x36/0x90 ? exc_general_protection+0x1c1/0x3f0 ? asm_exc_general_protection+0x26/0x30 ? __list_del_entry_valid_or_report+0x33/0xf0
__cifs_put_smb_ses+0x1ae/0x500 [cifs] smb2_reconnect_server+0x4ed/0x710 [cifs] process_one_work+0x205/0x6b0 worker_thread+0x191/0x360 ? __pfx_worker_thread+0x10/0x10 kthread+0xe2/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK>
CVE-2024-35868:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in cifs_stats_proc_write()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35867:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in cifs_stats_proc_show()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35866:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in cifs_dump_full_key()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35865:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in smb2_is_valid_oplock_break()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35864:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in smb2_is_valid_lease_break()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35863:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in is_valid_oplock_break()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35862:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in smb2_is_network_name_deleted()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35861:
In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF.
CVE-2024-35849:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
Syzbot reported the following information leak for in btrfs_ioctl_logical_to_ino():
BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0xbc/0x110 lib/usercopy.c:40 instrument_copy_to_user include/linux/instrumented.h:114 [inline]
_copy_to_user+0xbc/0x110 lib/usercopy.c:40 copy_to_user include/linux/uaccess.h:191 [inline] btrfs_ioctl_logical_to_ino+0x440/0x750 fs/btrfs/ioctl.c:3499 btrfs_ioctl+0x714/0x1260 vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x261/0x450 fs/ioctl.c:890
__x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890 x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at:
__kmalloc_large_node+0x231/0x370 mm/slub.c:3921
__do_kmalloc_node mm/slub.c:3954 [inline]
__kmalloc_node+0xb07/0x1060 mm/slub.c:3973 kmalloc_node include/linux/slab.h:648 [inline] kvmalloc_node+0xc0/0x2d0 mm/util.c:634 kvmalloc include/linux/slab.h:766 [inline] init_data_container+0x49/0x1e0 fs/btrfs/backref.c:2779 btrfs_ioctl_logical_to_ino+0x17c/0x750 fs/btrfs/ioctl.c:3480 btrfs_ioctl+0x714/0x1260 vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:904 [inline]
__se_sys_ioctl+0x261/0x450 fs/ioctl.c:890
__x64_sys_ioctl+0x96/0xe0 fs/ioctl.c:890 x64_sys_call+0x1883/0x3b50 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Bytes 40-65535 of 65536 are uninitialized Memory access of size 65536 starts at ffff888045a40000
This happens, because we're copying a 'struct btrfs_data_container' back to user-space. This btrfs_data_container is allocated in 'init_data_container()' via kvmalloc(), which does not zero-fill the memory.
Fix this by using kvzalloc() which zeroes out the memory on allocation.
CVE-2024-35844:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: compress: fix reserve_cblocks counting error when out of space
When a file only needs one direct_node, performing the following operations will cause the file to be unrepairable:
unisoc # ./f2fs_io compress test.apk unisoc #df -h | grep dm-48 /dev/block/dm-48 112G 112G 1.2M 100% /data
unisoc # ./f2fs_io release_cblocks test.apk 924 unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 4.8M 100% /data
unisoc # dd if=/dev/random of=file4 bs=1M count=3 3145728 bytes (3.0 M) copied, 0.025 s, 120 M/s unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 1.8M 100% /data
unisoc # ./f2fs_io reserve_cblocks test.apk F2FS_IOC_RESERVE_COMPRESS_BLOCKS failed: No space left on device
adb reboot unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 11M 100% /data unisoc # ./f2fs_io reserve_cblocks test.apk 0
This is because the file has only one direct_node. After returning to -ENOSPC, reserved_blocks += ret will not be executed. As a result, the reserved_blocks at this time is still 0, which is not the real number of reserved blocks. Therefore, fsck cannot be set to repair the file.
After this patch, the fsck flag will be set to fix this problem.
unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 1.8M 100% /data unisoc # ./f2fs_io reserve_cblocks test.apk F2FS_IOC_RESERVE_COMPRESS_BLOCKS failed: No space left on device
adb reboot then fsck will be executed unisoc # df -h | grep dm-48 /dev/block/dm-48 112G 112G 11M 100% /data unisoc # ./f2fs_io reserve_cblocks test.apk 924
CVE-2024-35821:
In the Linux kernel, the following vulnerability has been resolved:
ubifs: Set page uptodate in the correct place
Page cache reads are lockless, so setting the freshly allocated page uptodate before we've overwritten it with the data it's supposed to have in it will allow a simultaneous reader to see old data. Move the call to SetPageUptodate into ubifs_write_end(), which is after we copied the new data into the page.
CVE-2024-35815:
In the Linux kernel, the following vulnerability has been resolved:
fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
The first kiocb_set_cancel_fn() argument may point at a struct kiocb that is not embedded inside struct aio_kiocb. With the current code, depending on the compiler, the req->ki_ctx read happens either before the IOCB_AIO_RW test or after that test. Move the req->ki_ctx read such that it is guaranteed that the IOCB_AIO_RW test happens first.
CVE-2024-35807:
In the Linux kernel, the following vulnerability has been resolved:
ext4: fix corruption during on-line resize
We observed a corruption during on-line resize of a file system that is larger than 16 TiB with 4k block size. With having more then 2^32 blocks resize_inode is turned off by default by mke2fs. The issue can be reproduced on a smaller file system for convenience by explicitly turning off resize_inode. An on-line resize across an 8 GiB boundary (the size of a meta block group in this setup) then leads to a corruption:
dev=/dev/<some_dev> # should be >= 16 GiB mkdir -p /corruption /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15)) mount -t ext4 $dev /corruption
dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15)) sha1sum /corruption/test # 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test
/sbin/resize2fs $dev $((2*2**21)) # drop page cache to force reload the block from disk echo 1 > /proc/sys/vm/drop_caches
sha1sum /corruption/test # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test
2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per block group and 2^6 are the number of block groups that make a meta block group.
The last checksum might be different depending on how the file is laid out across the physical blocks. The actual corruption occurs at physical block 63*2^15 = 2064384 which would be the location of the backup of the meta block group's block descriptor. During the on-line resize the file system will be converted to meta_bg starting at s_first_meta_bg which is 2 in the example - meaning all block groups after 16 GiB. However, in ext4_flex_group_add we might add block groups that are not part of the first meta block group yet. In the reproducer we achieved this by substracting the size of a whole block group from the point where the meta block group would start. This must be considered when updating the backup block group descriptors to follow the non-meta_bg layout. The fix is to add a test whether the group to add is already part of the meta block group or not.
CVE-2024-35798:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix race in read_extent_buffer_pages()
There are reports from tree-checker that detects corrupted nodes, without any obvious pattern so possibly an overwrite in memory.
After some debugging it turns out there's a race when reading an extent buffer the uptodate status can be missed.
To prevent concurrent reads for the same extent buffer, read_extent_buffer_pages() performs these checks:
/* (1) */ if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) return 0;
/* (2) */ if (test_and_set_bit(EXTENT_BUFFER_READING, &eb->bflags)) goto done;
At this point, it seems safe to start the actual read operation. Once that completes, end_bbio_meta_read() does
/* (3) */ set_extent_buffer_uptodate(eb);
/* (4) */ clear_bit(EXTENT_BUFFER_READING, &eb->bflags);
Normally, this is enough to ensure only one read happens, and all other callers wait for it to finish before returning. Unfortunately, there is a racey interleaving:
Thread A | Thread B | Thread C
---------+----------+--------- (1) | | | (1) | (2) | | (3) | | (4) | | | (2) | | | (1)
When this happens, thread B kicks of an unnecessary read. Worse, thread C will see UPTODATE set and return immediately, while the read from thread B is still in progress. This race could result in tree-checker errors like this as the extent buffer is concurrently modified:
BTRFS critical (device dm-0): corrupted node, root=256 block=8550954455682405139 owner mismatch, have 11858205567642294356 expect [256, 18446744073709551360]
Fix it by testing UPTODATE again after setting the READING bit, and if it's been set, skip the unnecessary read.
[ minor update of changelog ]
CVE-2024-35784:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix deadlock with fiemap and extent locking
While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock.
This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it.
Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace
[<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76
I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking.
To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before.
The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag.
With this patch applied we no longer deadlock with my testcase.
CVE-2024-33847:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: compress: don't allow unaligned truncation on released compress inode
f2fs image may be corrupted after below testcase:
- mkfs.f2fs -O extra_attr,compression -f /dev/vdb
- mount /dev/vdb /mnt/f2fs
- touch /mnt/f2fs/file
- f2fs_io setflags compression /mnt/f2fs/file
- dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=4
- f2fs_io release_cblocks /mnt/f2fs/file
- truncate -s 8192 /mnt/f2fs/file
- umount /mnt/f2fs
- fsck.f2fs /dev/vdb
[ASSERT] (fsck_chk_inode_blk:1256) --> ino: 0x5 has i_blocks: 0x00000002, but has 0x3 blocks [FSCK] valid_block_count matching with CP [Fail] [0x4, 0x5] [FSCK] other corrupted bugs [Fail]
The reason is: partial truncation assume compressed inode has reserved blocks, after partial truncation, valid block count may change w/o .i_blocks and .total_valid_block_count update, result in corruption.
This patch only allow cluster size aligned truncation on released compress inode for fixing.
CVE-2024-27389:
In the Linux kernel, the following vulnerability has been resolved:
pstore: inode: Only d_invalidate() is needed
Unloading a modular pstore backend with records in pstorefs would trigger the dput() double-drop warning:
WARNING: CPU: 0 PID: 2569 at fs/dcache.c:762 dput.part.0+0x3f3/0x410
Using the combo of d_drop()/dput() (as mentioned in Documentation/filesystems/vfs.rst) isn't the right approach here, and leads to the reference counting problem seen above. Use d_invalidate() and update the code to not bother checking for error codes that can never happen.
---
CVE-2024-27080:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix race when detecting delalloc ranges during fiemap
For fiemap we recently stopped locking the target extent range for the whole duration of the fiemap call, in order to avoid a deadlock in a scenario where the fiemap buffer happens to be a memory mapped range of the same file. This use case is very unlikely to be useful in practice but it may be triggered by fuzz testing (syzbot, etc).
This however introduced a race that makes us miss delalloc ranges for file regions that are currently holes, so the caller of fiemap will not be aware that there's data for some file regions. This can be quite serious for some use cases - for example in coreutils versions before 9.0, the cp program used fiemap to detect holes and data in the source file, copying only regions with data (extents or delalloc) from the source file to the destination file in order to preserve holes (see the documentation for its --sparse command line option). This means that if cp was used with a source file that had delalloc in a hole, the destination file could end up without that data, which is effectively a data loss issue, if it happened to hit the race described below.
The race happens like this:
1) Fiemap is called, without the FIEMAP_FLAG_SYNC flag, for a file that has delalloc in the file range [64M, 65M[, which is currently a hole;
2) Fiemap locks the inode in shared mode, then starts iterating the inode's subvolume tree searching for file extent items, without having the whole fiemap target range locked in the inode's io tree - the change introduced recently by commit b0ad381fa769 (btrfs: fix deadlock with fiemap and extent locking). It only locks ranges in the io tree when it finds a hole or prealloc extent since that commit;
3) Note that fiemap clones each leaf before using it, and this is to avoid deadlocks when locking a file range in the inode's io tree and the fiemap buffer is memory mapped to some file, because writing to the page with btrfs_page_mkwrite() will wait on any ordered extent for the page's range and the ordered extent needs to lock the range and may need to modify the same leaf, therefore leading to a deadlock on the leaf;
4) While iterating the file extent items in the cloned leaf before finding the hole in the range [64M, 65M[, the delalloc in that range is flushed and its ordered extent completes - meaning the corresponding file extent item is in the inode's subvolume tree, but not present in the cloned leaf that fiemap is iterating over;
5) When fiemap finds the hole in the [64M, 65M[ range by seeing the gap in the cloned leaf (or a file extent item with disk_bytenr == 0 in case the NO_HOLES feature is not enabled), it will lock that file range in the inode's io tree and then search for delalloc by checking for the EXTENT_DELALLOC bit in the io tree for that range and ordered extents (with btrfs_find_delalloc_in_range()). But it finds nothing since the delalloc in that range was already flushed and the ordered extent completed and is gone - as a result fiemap will not report that there's delalloc or an extent for the range [64M, 65M[, so user space will be mislead into thinking that there's a hole in that range.
This could actually be sporadically triggered with test case generic/094 from fstests, which reports a missing extent/delalloc range like this:
generic/094 2s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad)
--- tests/generic/094.out 2020-06-10 19:29:03.830519425 +0100 +++ /home/fdmanana/git/hub/xfstests/results//generic/094.out.bad 2024-02-28 11:00:00.381071525 +0000 @@ -1,3 +1,9 @@ QA output created by 094 fiemap run with sync fiemap run without sync +ERROR: couldn't find extent at 7 +map is 'HHDDHPPDPHPH' +logical: [ 5.. 6] phys:
---truncated---
CVE-2024-27036:
In the Linux kernel, the following vulnerability has been resolved:
cifs: Fix writeback data corruption
cifs writeback doesn't correctly handle the case where cifs_extend_writeback() hits a point where it is considering an additional folio, but this would overrun the wsize - at which point it drops out of the xarray scanning loop and calls xas_pause(). The problem is that xas_pause() advances the loop counter - thereby skipping that page.
What needs to happen is for xas_reset() to be called any time we decide we don't want to process the page we're looking at, but rather send the request we are building and start a new one.
Fix this by copying and adapting the netfslib writepages code as a temporary measure, with cifs writeback intending to be offloaded to netfslib in the near future.
This also fixes the issue with the use of filemap_get_folios_tag() causing retry of a bunch of pages which the extender already dealt with.
This can be tested by creating, say, a 64K file somewhere not on cifs (otherwise copy-offload may get underfoot), mounting a cifs share with a wsize of 64000, copying the file to it and then comparing the original file and the copy:
dd if=/dev/urandom of=/tmp/64K bs=64k count=1 mount //192.168.6.1/test /mnt -o user=...,pass=...,wsize=64000 cp /tmp/64K /mnt/64K cmp /tmp/64K /mnt/64K
Without the fix, the cmp fails at position 64000 (or shortly thereafter).
CVE-2024-27035:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: compress: fix to guarantee persisting compressed blocks by CP
If data block in compressed cluster is not persisted with metadata during checkpoint, after SPOR, the data may be corrupted, let's guarantee to write compressed page by checkpoint.
CVE-2024-27034:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: compress: fix to cover normal cluster write with cp_rwsem
When we overwrite compressed cluster w/ normal cluster, we should not unlock cp_rwsem during f2fs_write_raw_pages(), otherwise data will be corrupted if partial blocks were persisted before CP & SPOR, due to cluster metadata wasn't updated atomically.
CVE-2024-27033:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to remove unnecessary f2fs_bug_on() to avoid panic
verify_blkaddr() will trigger panic once we inject fault into f2fs_is_valid_blkaddr(), fix to remove this unnecessary f2fs_bug_on().
CVE-2024-27032:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to avoid potential panic during recovery
During recovery, if FAULT_BLOCK is on, it is possible that f2fs_reserve_new_block() will return -ENOSPC during recovery, then it may trigger panic.
Also, if fault injection rate is 1 and only FAULT_BLOCK fault type is on, it may encounter deadloop in loop of block reservation.
Let's change as below to fix these issues:
- remove bug_on() to avoid panic.
- limit the loop count of block reservation to avoid potential deadloop.
CVE-2024-27031:
In the Linux kernel, the following vulnerability has been resolved:
NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt
The loop inside nfs_netfs_issue_read() currently does not disable interrupts while iterating through pages in the xarray to submit for NFS read. This is not safe though since after taking xa_lock, another page in the mapping could be processed for writeback inside an interrupt, and deadlock can occur. The fix is simple and clean if we use xa_for_each_range(), which handles the iteration with RCU while reducing code complexity.
The problem is easily reproduced with the following test:
mount -o vers=3,fsc 127.0.0.1:/export /mnt/nfs dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1 echo 3 > /proc/sys/vm/drop_caches dd if=/mnt/nfs/file1.bin of=/dev/null umount /mnt/nfs
On the console with a lockdep-enabled kernel a message similar to the following will be seen:
================================ WARNING: inconsistent lock state 6.7.0-lockdbg+ #10 Not tainted
-------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
test5/1708 [HC0[0]:SC0[0]:HE1:SE1] takes:
ffff888127baa598 (&xa->xa_lock#4){+.?.}-{3:3}, at:
nfs_netfs_issue_read+0x1b2/0x4b0 [nfs] {IN-SOFTIRQ-W} state was registered at:
lock_acquire+0x144/0x380
_raw_spin_lock_irqsave+0x4e/0xa0
__folio_end_writeback+0x17e/0x5c0 folio_end_writeback+0x93/0x1b0 iomap_finish_ioend+0xeb/0x6a0 blk_update_request+0x204/0x7f0 blk_mq_end_request+0x30/0x1c0 blk_complete_reqs+0x7e/0xa0
__do_softirq+0x113/0x544
__irq_exit_rcu+0xfe/0x120 irq_exit_rcu+0xe/0x20 sysvec_call_function_single+0x6f/0x90 asm_sysvec_call_function_single+0x1a/0x20 pv_native_safe_halt+0xf/0x20 default_idle+0x9/0x20 default_idle_call+0x67/0xa0 do_idle+0x2b5/0x300 cpu_startup_entry+0x34/0x40 start_secondary+0x19d/0x1c0 secondary_startup_64_no_verify+0x18f/0x19b irq event stamp: 176891 hardirqs last enabled at (176891): [<ffffffffa67a0be4>]
_raw_spin_unlock_irqrestore+0x44/0x60 hardirqs last disabled at (176890): [<ffffffffa67a0899>]
_raw_spin_lock_irqsave+0x79/0xa0 softirqs last enabled at (176646): [<ffffffffa515d91e>]
__irq_exit_rcu+0xfe/0x120 softirqs last disabled at (176633): [<ffffffffa515d91e>]
__irq_exit_rcu+0xfe/0x120
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
---- lock(&xa->xa_lock#4);
<Interrupt> lock(&xa->xa_lock#4);
*** DEADLOCK ***
2 locks held by test5/1708:
#0: ffff888127baa498 (&sb->s_type->i_mutex_key#22){++++}-{4:4}, at:
nfs_start_io_read+0x28/0x90 [nfs] #1: ffff888127baa650 (mapping.invalidate_lock#3){.+.+}-{4:4}, at:
page_cache_ra_unbounded+0xa4/0x280
stack backtrace:
CPU: 6 PID: 1708 Comm: test5 Kdump: loaded Not tainted 6.7.0-lockdbg+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39 04/01/2014 Call Trace:
dump_stack_lvl+0x5b/0x90 mark_lock+0xb3f/0xd20
__lock_acquire+0x77b/0x3360
_raw_spin_lock+0x34/0x80 nfs_netfs_issue_read+0x1b2/0x4b0 [nfs] netfs_begin_read+0x77f/0x980 [netfs] nfs_netfs_readahead+0x45/0x60 [nfs] nfs_readahead+0x323/0x5a0 [nfs] read_pages+0xf3/0x5c0 page_cache_ra_unbounded+0x1c8/0x280 filemap_get_pages+0x38c/0xae0 filemap_read+0x206/0x5e0 nfs_file_read+0xb7/0x140 [nfs] vfs_read+0x2a9/0x460 ksys_read+0xb7/0x140
CVE-2024-26993:
In the Linux kernel, the following vulnerability has been resolved:
fs: sysfs: Fix reference leak in sysfs_break_active_protection()
The sysfs_break_active_protection() routine has an obvious reference leak in its error path. If the call to kernfs_find_and_get() fails then kn will be NULL, so the companion sysfs_unbreak_active_protection() routine won't get called (and would only cause an access violation by trying to dereference kn->parent if it was called). As a result, the reference to kobj acquired at the start of the function will never be released.
Fix the leak by adding an explicit kobject_put() call when kn is NULL.
CVE-2024-26904:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
At btrfs_use_block_rsv() we read the size of a block reserve without locking its spinlock, which makes KCSAN complain because the size of a block reserve is always updated while holding its spinlock. The report from KCSAN is the following:
[653.313148] BUG: KCSAN: data-race in btrfs_update_delayed_refs_rsv [btrfs] / btrfs_use_block_rsv [btrfs]
[653.314755] read to 0x000000017f5871b8 of 8 bytes by task 7519 on cpu 0:
[653.314779] btrfs_use_block_rsv+0xe4/0x2f8 [btrfs] [653.315606] btrfs_alloc_tree_block+0xdc/0x998 [btrfs] [653.316421] btrfs_force_cow_block+0x220/0xe38 [btrfs] [653.317242] btrfs_cow_block+0x1ac/0x568 [btrfs] [653.318060] btrfs_search_slot+0xda2/0x19b8 [btrfs] [653.318879] btrfs_del_csums+0x1dc/0x798 [btrfs] [653.319702] __btrfs_free_extent.isra.0+0xc24/0x2028 [btrfs] [653.320538] __btrfs_run_delayed_refs+0xd3c/0x2390 [btrfs] [653.321340] btrfs_run_delayed_refs+0xae/0x290 [btrfs] [653.322140] flush_space+0x5e4/0x718 [btrfs] [653.322958] btrfs_preempt_reclaim_metadata_space+0x102/0x2f8 [btrfs] [653.323781] process_one_work+0x3b6/0x838 [653.323800] worker_thread+0x75e/0xb10 [653.323817] kthread+0x21a/0x230 [653.323836] __ret_from_fork+0x6c/0xb8 [653.323855] ret_from_fork+0xa/0x30
[653.323887] write to 0x000000017f5871b8 of 8 bytes by task 576 on cpu 3:
[653.323906] btrfs_update_delayed_refs_rsv+0x1a4/0x250 [btrfs] [653.324699] btrfs_add_delayed_data_ref+0x468/0x6d8 [btrfs] [653.325494] btrfs_free_extent+0x76/0x120 [btrfs] [653.326280] __btrfs_mod_ref+0x6a8/0x6b8 [btrfs] [653.327064] btrfs_dec_ref+0x50/0x70 [btrfs] [653.327849] walk_up_proc+0x236/0xa50 [btrfs] [653.328633] walk_up_tree+0x21c/0x448 [btrfs] [653.329418] btrfs_drop_snapshot+0x802/0x1328 [btrfs] [653.330205] btrfs_clean_one_deleted_snapshot+0x184/0x238 [btrfs] [653.330995] cleaner_kthread+0x2b0/0x2f0 [btrfs] [653.331781] kthread+0x21a/0x230 [653.331800] __ret_from_fork+0x6c/0xb8 [653.331818] ret_from_fork+0xa/0x30
So add a helper to get the size of a block reserve while holding the lock.
Reading the field while holding the lock instead of using the data_race() annotation is used in order to prevent load tearing.
CVE-2024-26901:
In the Linux kernel, the following vulnerability has been resolved:
do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak
syzbot identified a kernel information leak vulnerability in do_sys_name_to_handle() and issued the following report [1].
[1] BUG: KMSAN: kernel-infoleak in instrument_copy_to_user include/linux/instrumented.h:114 [inline] BUG: KMSAN: kernel-infoleak in _copy_to_user+0xbc/0x100 lib/usercopy.c:40 instrument_copy_to_user include/linux/instrumented.h:114 [inline]
_copy_to_user+0xbc/0x100 lib/usercopy.c:40 copy_to_user include/linux/uaccess.h:191 [inline] do_sys_name_to_handle fs/fhandle.c:73 [inline]
__do_sys_name_to_handle_at fs/fhandle.c:112 [inline]
__se_sys_name_to_handle_at+0x949/0xb10 fs/fhandle.c:94
__x64_sys_name_to_handle_at+0xe4/0x140 fs/fhandle.c:94 ...
Uninit was created at:
slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768 slab_alloc_node mm/slub.c:3478 [inline]
__kmem_cache_alloc_node+0x5c9/0x970 mm/slub.c:3517
__do_kmalloc_node mm/slab_common.c:1006 [inline]
__kmalloc+0x121/0x3c0 mm/slab_common.c:1020 kmalloc include/linux/slab.h:604 [inline] do_sys_name_to_handle fs/fhandle.c:39 [inline]
__do_sys_name_to_handle_at fs/fhandle.c:112 [inline]
__se_sys_name_to_handle_at+0x441/0xb10 fs/fhandle.c:94
__x64_sys_name_to_handle_at+0xe4/0x140 fs/fhandle.c:94 ...
Bytes 18-19 of 20 are uninitialized Memory access of size 20 starts at ffff888128a46380 Data copied to user address 0000000020000240
Per Chuck Lever's suggestion, use kzalloc() instead of kmalloc() to solve the problem.
CVE-2024-26878:
In the Linux kernel, the following vulnerability has been resolved:
quota: Fix potential NULL pointer dereference
Below race may cause NULL pointer dereference
P1 P2 dquot_free_inode quota_off drop_dquot_ref remove_dquot_ref dquots = i_dquot(inode) dquots = i_dquot(inode) srcu_read_lock dquots[cnt]) != NULL (1) dquots[type] = NULL (2) spin_lock(&dquots[cnt]->dq_dqb_lock) (3) ....
If dquot_free_inode(or other routines) checks inode's quota pointers (1) before quota_off sets it to NULL(2) and use it (3) after that, NULL pointer dereference will be triggered.
So let's fix it by using a temporary pointer to avoid this issue.
CVE-2024-26871:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix NULL pointer dereference in f2fs_submit_page_write()
BUG: kernel NULL pointer dereference, address: 0000000000000014 RIP: 0010:f2fs_submit_page_write+0x6cf/0x780 [f2fs] Call Trace:
<TASK> ? show_regs+0x6e/0x80 ? __die+0x29/0x70 ? page_fault_oops+0x154/0x4a0 ? prb_read_valid+0x20/0x30 ? __irq_work_queue_local+0x39/0xd0 ? irq_work_queue+0x36/0x70 ? do_user_addr_fault+0x314/0x6c0 ? exc_page_fault+0x7d/0x190 ? asm_exc_page_fault+0x2b/0x30 ? f2fs_submit_page_write+0x6cf/0x780 [f2fs] ? f2fs_submit_page_write+0x736/0x780 [f2fs] do_write_page+0x50/0x170 [f2fs] f2fs_outplace_write_data+0x61/0xb0 [f2fs] f2fs_do_write_data_page+0x3f8/0x660 [f2fs] f2fs_write_single_data_page+0x5bb/0x7a0 [f2fs] f2fs_write_cache_pages+0x3da/0xbe0 [f2fs] ...
It is possible that other threads have added this fio to io->bio and submitted the io->bio before entering f2fs_submit_page_write().
At this point io->bio = NULL.
If is_end_zone_blkaddr(sbi, fio->new_blkaddr) of this fio is true, then an NULL pointer dereference error occurs at bio_get(io->bio).
The original code for determining zone end was after out:, which would have missed some fio who is zone end. I've moved this code before skip: to make sure it's done for each fio.
CVE-2024-26870:
In the Linux kernel, the following vulnerability has been resolved:
NFSv4.2: fix nfs4_listxattr kernel BUG at mm/usercopy.c:102
A call to listxattr() with a buffer size = 0 returns the actual size of the buffer needed for a subsequent call. When size > 0, nfs4_listxattr() does not return an error because either generic_listxattr() or nfs4_listxattr_nfs4_label() consumes exactly all the bytes then size is 0 when calling nfs4_listxattr_nfs4_user() which then triggers the following kernel BUG:
[ 99.403778] kernel BUG at mm/usercopy.c:102! [ 99.404063] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP [ 99.408463] CPU: 0 PID: 3310 Comm: python3 Not tainted 6.6.0-61.fc40.aarch64 #1 [ 99.415827] Call trace:
[ 99.415985] usercopy_abort+0x70/0xa0 [ 99.416227] __check_heap_object+0x134/0x158 [ 99.416505] check_heap_object+0x150/0x188 [ 99.416696] __check_object_size.part.0+0x78/0x168 [ 99.416886] __check_object_size+0x28/0x40 [ 99.417078] listxattr+0x8c/0x120 [ 99.417252] path_listxattr+0x78/0xe0 [ 99.417476] __arm64_sys_listxattr+0x28/0x40 [ 99.417723] invoke_syscall+0x78/0x100 [ 99.417929] el0_svc_common.constprop.0+0x48/0xf0 [ 99.418186] do_el0_svc+0x24/0x38 [ 99.418376] el0_svc+0x3c/0x110 [ 99.418554] el0t_64_sync_handler+0x120/0x130 [ 99.418788] el0t_64_sync+0x194/0x198 [ 99.418994] Code: aa0003e3 d000a3e0 91310000 97f49bdb (d4210000)
Issue is reproduced when generic_listxattr() returns 'system.nfs4_acl', thus calling lisxattr() with size = 16 will trigger the bug.
Add check on nfs4_listxattr() to return ERANGE error when it is called with size > 0 and the return value is greater than size.
CVE-2024-26869:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to truncate meta inode pages forcely
Below race case can cause data corruption:
Thread A GC thread
- gc_data_segment
- ra_data_block
- locked meta_inode page
- f2fs_inplace_write_data
- invalidate_mapping_pages : fail to invalidate meta_inode page due to lock failure or dirty|writeback status
- f2fs_submit_page_bio : write last dirty data to old blkaddr
- move_data_block
- load old data from meta_inode page
- f2fs_submit_page_write : write old data to new blkaddr
Because invalidate_mapping_pages() will skip invalidating page which has unclear status including locked, dirty, writeback and so on, so we need to use truncate_inode_pages_range() instead of invalidate_mapping_pages() to make sure meta_inode page will be dropped.
CVE-2024-26868:
In the Linux kernel, the following vulnerability has been resolved:
nfs: fix panic when nfs4_ff_layout_prepare_ds() fails
We've been seeing the following panic in production
BUG: kernel NULL pointer dereference, address: 0000000000000065 PGD 2f485f067 P4D 2f485f067 PUD 2cc5d8067 PMD 0 RIP: 0010:ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles] Call Trace:
<TASK> ? __die+0x78/0xc0 ? page_fault_oops+0x286/0x380 ? __rpc_execute+0x2c3/0x470 [sunrpc] ? rpc_new_task+0x42/0x1c0 [sunrpc] ? exc_page_fault+0x5d/0x110 ? asm_exc_page_fault+0x22/0x30 ? ff_layout_free_layoutreturn+0x110/0x110 [nfs_layout_flexfiles] ? ff_layout_cancel_io+0x3a/0x90 [nfs_layout_flexfiles] ? ff_layout_cancel_io+0x6f/0x90 [nfs_layout_flexfiles] pnfs_mark_matching_lsegs_return+0x1b0/0x360 [nfsv4] pnfs_error_mark_layout_for_return+0x9e/0x110 [nfsv4] ? ff_layout_send_layouterror+0x50/0x160 [nfs_layout_flexfiles] nfs4_ff_layout_prepare_ds+0x11f/0x290 [nfs_layout_flexfiles] ff_layout_pg_init_write+0xf0/0x1f0 [nfs_layout_flexfiles]
__nfs_pageio_add_request+0x154/0x6c0 [nfs] nfs_pageio_add_request+0x26b/0x380 [nfs] nfs_do_writepage+0x111/0x1e0 [nfs] nfs_writepages_callback+0xf/0x30 [nfs] write_cache_pages+0x17f/0x380 ? nfs_pageio_init_write+0x50/0x50 [nfs] ? nfs_writepages+0x6d/0x210 [nfs] ? nfs_writepages+0x6d/0x210 [nfs] nfs_writepages+0x125/0x210 [nfs] do_writepages+0x67/0x220 ? generic_perform_write+0x14b/0x210 filemap_fdatawrite_wbc+0x5b/0x80 file_write_and_wait_range+0x6d/0xc0 nfs_file_fsync+0x81/0x170 [nfs] ? nfs_file_mmap+0x60/0x60 [nfs]
__x64_sys_fsync+0x53/0x90 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0
Inspecting the core with drgn I was able to pull this
>>> prog.crashed_thread().stack_trace()[0] #0 at 0xffffffffa079657a (ff_layout_cancel_io+0x3a/0x84) in ff_layout_cancel_io at fs/nfs/flexfilelayout/flexfilelayout.c:2021:27 >>> prog.crashed_thread().stack_trace()[0]['idx'] (u32)1 >>> prog.crashed_thread().stack_trace()[0]['flseg'].mirror_array[1].mirror_ds (struct nfs4_ff_layout_ds *)0xffffffffffffffed
This is clear from the stack trace, we call nfs4_ff_layout_prepare_ds() which could error out initializing the mirror_ds, and then we go to clean it all up and our check is only for if (!mirror->mirror_ds). This is inconsistent with the rest of the users of mirror_ds, which have
if (IS_ERR_OR_NULL(mirror_ds))
to keep from tripping over this exact scenario. Fix this up in ff_layout_cancel_io() to make sure we don't panic when we get an error.
I also spot checked all the other instances of checking mirror_ds and we appear to be doing the correct checks everywhere, only unconditionally dereferencing mirror_ds when we know it would be valid.
CVE-2024-26773:
In the Linux kernel, the following vulnerability has been resolved:
ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
Determine if the group block bitmap is corrupted before using ac_b_ex in ext4_mb_try_best_found() to avoid allocating blocks from a group with a corrupted block bitmap in the following concurrency and making the situation worse.
ext4_mb_regular_allocator ext4_lock_group(sb, group) ext4_mb_good_group // check if the group bbitmap is corrupted ext4_mb_complex_scan_group // Scan group gets ac_b_ex but doesn't use it ext4_unlock_group(sb, group) ext4_mark_group_bitmap_corrupted(group) // The block bitmap was corrupted during // the group unlock gap.
ext4_mb_try_best_found ext4_lock_group(ac->ac_sb, group) ext4_mb_use_best_found mb_mark_used // Allocating blocks in block bitmap corrupted group
CVE-2024-26764:
In the Linux kernel, the following vulnerability has been resolved:
fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears:
WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace:
kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc
__arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8
Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio.
CVE-2024-26736:
In the Linux kernel, the following vulnerability has been resolved:
afs: Increase buffer size in afs_update_volume_status()
The max length of volume->vid value is 20 characters.
So increase idbuf[] size up to 24 to avoid overflow.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[DH: Actually, it's 20 + NUL, so increase it to 24 and use snprintf()]
CVE-2024-26697:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix data corruption in dsync block recovery for small block sizes
The helper function nilfs_recovery_copy_block() of nilfs_recovery_dsync_blocks(), which recovers data from logs created by data sync writes during a mount after an unclean shutdown, incorrectly calculates the on-page offset when copying repair data to the file's page cache. In environments where the block size is smaller than the page size, this flaw can cause data corruption and leak uninitialized memory bytes during the recovery process.
Fix these issues by correcting this byte offset calculation on the page.
CVE-2024-26688:
In the Linux kernel, the following vulnerability has been resolved:
fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
When configuring a hugetlb filesystem via the fsconfig() syscall, there is a possible NULL dereference in hugetlbfs_fill_super() caused by assigning NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize is non valid.
E.g: Taking the following steps:
fd = fsopen(hugetlbfs, FSOPEN_CLOEXEC);
fsconfig(fd, FSCONFIG_SET_STRING, pagesize, 1024, 0);
fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
Given that the requested pagesize is invalid, ctxt->hstate will be replaced with NULL, losing its previous value, and we will print an error:
...
...
case Opt_pagesize:
ps = memparse(param->string, &rest);
ctx->hstate = h;
if (!ctx->hstate) { pr_err(Unsupported page size %lu MB\n, ps / SZ_1M);
return -EINVAL;
} return 0;
...
...
This is a problem because later on, we will dereference ctxt->hstate in hugetlbfs_fill_super()
...
...
sb->s_blocksize = huge_page_size(ctx->hstate);
...
...
Causing below Oops.
Fix this by replacing cxt->hstate value only when then pagesize is known to be valid.
kernel: hugetlbfs: Unsupported page size 0 MB kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0 kernel: Oops: 0000 [#1] PREEMPT SMP PTI kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 kernel: Call Trace:
kernel: <TASK> kernel: ? __die_body+0x1a/0x60 kernel: ? page_fault_oops+0x16f/0x4a0 kernel: ? search_bpf_extables+0x65/0x70 kernel: ? fixup_exception+0x22/0x310 kernel: ? exc_page_fault+0x69/0x150 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: ? hugetlbfs_fill_super+0xb4/0x1a0 kernel: ? hugetlbfs_fill_super+0x28/0x1a0 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: vfs_get_super+0x40/0xa0 kernel: ? __pfx_bpf_lsm_capable+0x10/0x10 kernel: vfs_get_tree+0x25/0xd0 kernel: vfs_cmd_create+0x64/0xe0 kernel: __x64_sys_fsconfig+0x395/0x410 kernel: do_syscall_64+0x80/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? exc_page_fault+0x69/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 kernel: RIP: 0033:0x7ffbc0cb87c9 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af kernel: RAX: fffffffffff
---truncated---
CVE-2024-26685:
In the Linux kernel, the following vulnerability has been resolved:
nilfs2: fix potential bug in end_buffer_async_write
According to a syzbot report, end_buffer_async_write(), which handles the completion of block device writes, may detect abnormal condition of the buffer async_write flag and cause a BUG_ON failure when using nilfs2.
Nilfs2 itself does not use end_buffer_async_write(). But, the async_write flag is now used as a marker by commit 7f42ec394156 (nilfs2: fix issue with race condition of competition between segments for dirty blocks) as a means of resolving double list insertion of dirty blocks in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the resulting crash.
This modification is safe as long as it is used for file data and b-tree node blocks where the page caches are independent. However, it was irrelevant and redundant to also introduce async_write for segment summary and super root blocks that share buffers with the backing device. This led to the possibility that the BUG_ON check in end_buffer_async_write would fail as described above, if independent writebacks of the backing device occurred in parallel.
The use of async_write for segment summary buffers has already been removed in a previous change.
Fix this issue by removing the manipulation of the async_write flag for the remaining super root block buffer.
CVE-2024-0841:
A null pointer dereference flaw was found in the hugetlbfs_fill_super function in the Linux kernel hugetlbfs (HugeTLB pages) functionality. This issue may allow a local user to crash the system or potentially escalate their privileges on the system.
CVE-2023-52902:
In the Linux kernel, the following vulnerability has been resolved:
nommu: fix memory leak in do_mmap() error path
The preallocation of the maple tree nodes may leak if the error path to error_just_free is taken. Fix this by moving the freeing of the maple tree nodes to a shared location for all error paths.
CVE-2023-52869:
In the Linux kernel, the following vulnerability has been resolved:
pstore/platform: Add check for kstrdup
Add check for the return value of kstrdup() and return the error if it fails in order to avoid NULL pointer dereference.
CVE-2023-52813:
In the Linux kernel, the following vulnerability has been resolved:
crypto: pcrypt - Fix hungtask for PADATA_RESET
We found a hungtask bug in test_aead_vec_cfg as follows:
INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
echo 0 > /proc/sys/kernel/hung_task_timeout_secs disables this message.
Call trace:
__switch_to+0x98/0xe0
__schedule+0x6c4/0xf40 schedule+0xd8/0x1b4 schedule_timeout+0x474/0x560 wait_for_common+0x368/0x4e0 wait_for_completion+0x20/0x30 wait_for_completion+0x20/0x30 test_aead_vec_cfg+0xab4/0xd50 test_aead+0x144/0x1f0 alg_test_aead+0xd8/0x1e0 alg_test+0x634/0x890 cryptomgr_test+0x40/0x70 kthread+0x1e0/0x220 ret_from_fork+0x10/0x18 Kernel panic - not syncing: hung_task: blocked tasks
For padata_do_parallel, when the return err is 0 or -EBUSY, it will call wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal case, aead_request_complete() will be called in pcrypt_aead_serial and the return err is 0 for padata_do_parallel. But, when pinst->flags is PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it won't call aead_request_complete(). Therefore, test_aead_vec_cfg will hung at wait_for_completion(&wait->completion), which will cause hungtask.
The problem comes as following:
(padata_do_parallel) | rcu_read_lock_bh(); | err = -EINVAL; | (padata_replace) | pinst->flags |= PADATA_RESET;
err = -EBUSY | if (pinst->flags & PADATA_RESET) | rcu_read_unlock_bh() | return err
In order to resolve the problem, we replace the return err -EBUSY with
-EAGAIN, which means parallel_data is changing, and the caller should call it again.
v3:
remove retry and just change the return err.
v2:
introduce padata_try_do_parallel() in pcrypt_aead_encrypt and pcrypt_aead_decrypt to solve the hungtask.
CVE-2023-52794:
In the Linux kernel, the following vulnerability has been resolved:
thermal: intel: powerclamp: fix mismatch in get function for max_idle
KASAN reported this
[ 444.853098] BUG: KASAN: global-out-of-bounds in param_get_int+0x77/0x90 [ 444.853111] Read of size 4 at addr ffffffffc16c9220 by task cat/2105 ...
[ 444.853442] The buggy address belongs to the variable:
[ 444.853443] max_idle+0x0/0xffffffffffffcde0 [intel_powerclamp]
There is a mismatch between the param_get_int and the definition of max_idle. Replacing param_get_int with param_get_byte resolves this issue.
CVE-2023-52759:
In the Linux kernel, the following vulnerability has been resolved:
gfs2: ignore negated quota changes
When lots of quota changes are made, there may be cases in which an inode's quota information is increased and then decreased, such as when blocks are added to a file, then deleted from it. If the timing is right, function do_qc can add pending quota changes to a transaction, then later, another call to do_qc can negate those changes, resulting in a net gain of 0. The quota_change information is recorded in the qc buffer (and qd element of the inode as well). The buffer is added to the transaction by the first call to do_qc, but a subsequent call changes the value from non-zero back to zero. At that point it's too late to remove the buffer_head from the transaction. Later, when the quota sync code is called, the zero-change qd element is discovered and flagged as an assert warning. If the fs is mounted with errors=panic, the kernel will panic.
This is usually seen when files are truncated and the quota changes are negated by punch_hole/truncate which uses gfs2_quota_hold and gfs2_quota_unhold rather than block allocations that use gfs2_quota_lock and gfs2_quota_unlock which automatically do quota sync.
This patch solves the problem by adding a check to qd_check_sync such that net-zero quota changes already added to the transaction are no longer deemed necessary to be synced, and skipped.
In this case references are taken for the qd and the slot from do_qc so those need to be put. The normal sequence of events for a normal non-zero quota change is as follows:
gfs2_quota_change do_qc qd_hold slot_hold
Later, when the changes are to be synced:
gfs2_quota_sync qd_fish qd_check_sync gets qd ref via lockref_get_not_dead do_sync do_qc(QC_SYNC) qd_put lockref_put_or_lock qd_unlock qd_put lockref_put_or_lock
In the net-zero change case, we add a check to qd_check_sync so it puts the qd and slot references acquired in gfs2_quota_change and skip the unneeded sync.
CVE-2023-52622:
In the Linux kernel, the following vulnerability has been resolved:
ext4: avoid online resizing failures due to oversized flex bg
When we online resize an ext4 filesystem with a oversized flexbg_size,
mkfs.ext4 -F -G 67108864 $dev -b 4096 100M mount $dev $dir resize2fs $dev 16G
the following WARN_ON is triggered:
================================================================== WARNING: CPU: 0 PID: 427 at mm/page_alloc.c:4402 __alloc_pages+0x411/0x550 Modules linked in: sg(E) CPU: 0 PID: 427 Comm: resize2fs Tainted: G E 6.6.0-rc5+ #314 RIP: 0010:__alloc_pages+0x411/0x550 Call Trace:
<TASK>
__kmalloc_large_node+0xa2/0x200
__kmalloc+0x16e/0x290 ext4_resize_fs+0x481/0xd80
__ext4_ioctl+0x1616/0x1d90 ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xf0/0x150 do_syscall_64+0x3b/0x90 ==================================================================
This is because flexbg_size is too large and the size of the new_group_data array to be allocated exceeds MAX_ORDER. Currently, the minimum value of MAX_ORDER is 8, the minimum value of PAGE_SIZE is 4096, the corresponding maximum number of groups that can be allocated is:
(PAGE_SIZE << MAX_ORDER) / sizeof(struct ext4_new_group_data) 21845
And the value that is down-aligned to the power of 2 is 16384. Therefore, this value is defined as MAX_RESIZE_BG, and the number of groups added each time does not exceed this value during resizing, and is added multiple times to complete the online resizing. The difference is that the metadata in a flex_bg may be more dispersed.
CVE-2023-52604:
In the Linux kernel, the following vulnerability has been resolved:
FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTree
Syzkaller reported the following issue:
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6 index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]') CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867 dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834 dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331 dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline] dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402 txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534 txUpdateMap+0x342/0x9e0 txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline] jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732 kthread+0x2d3/0x370 kernel/kthread.c:388 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> ================================================================================ Kernel panic - not syncing: UBSAN: panic_on_warn set ...
CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 panic+0x30f/0x770 kernel/panic.c:340 check_panic_on_warn+0x82/0xa0 kernel/panic.c:236 ubsan_epilogue lib/ubsan.c:223 [inline]
__ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348 dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867 dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834 dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331 dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline] dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402 txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534 txUpdateMap+0x342/0x9e0 txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline] jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732 kthread+0x2d3/0x370 kernel/kthread.c:388 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> Kernel Offset: disabled Rebooting in 86400 seconds..
The issue is caused when the value of lp becomes greater than CTLTREESIZE which is the max size of stree. Adding a simple check solves this issue.
Dave:
As the function returns a void, good error handling would require a more intrusive code reorganization, so I modified Osama's patch at use WARN_ON_ONCE for lack of a cleaner option.
The patch is tested via syzbot.
CVE-2023-52603:
In the Linux kernel, the following vulnerability has been resolved:
UBSAN: array-index-out-of-bounds in dtSplitRoot
Syzkaller reported the following issue:
oop0: detected capacity change from 0 to 32768
UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:1971:9 index -2 is out of range for type 'struct dtslot [128]' CPU: 0 PID: 3613 Comm: syz-executor270 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline]
__ubsan_handle_out_of_bounds+0xdb/0x130 lib/ubsan.c:283 dtSplitRoot+0x8d8/0x1900 fs/jfs/jfs_dtree.c:1971 dtSplitUp fs/jfs/jfs_dtree.c:985 [inline] dtInsert+0x1189/0x6b80 fs/jfs/jfs_dtree.c:863 jfs_mkdir+0x757/0xb00 fs/jfs/namei.c:270 vfs_mkdir+0x3b3/0x590 fs/namei.c:4013 do_mkdirat+0x279/0x550 fs/namei.c:4038
__do_sys_mkdirat fs/namei.c:4053 [inline]
__se_sys_mkdirat fs/namei.c:4051 [inline]
__x64_sys_mkdirat+0x85/0x90 fs/namei.c:4051 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fcdc0113fd9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffeb8bc67d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000102 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcdc0113fd9 RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000003 RBP: 00007fcdc00d37a0 R08: 0000000000000000 R09: 00007fcdc00d37a0 R10: 00005555559a72c0 R11: 0000000000000246 R12: 00000000f8008000 R13: 0000000000000000 R14: 00083878000000f8 R15: 0000000000000000 </TASK>
The issue is caused when the value of fsi becomes less than -1.
The check to break the loop when fsi value becomes -1 is present but syzbot was able to produce value less than -1 which cause the error.
This patch simply add the change for the values less than 0.
The patch is tested via syzbot.
CVE-2023-52602:
In the Linux kernel, the following vulnerability has been resolved:
jfs: fix slab-out-of-bounds Read in dtSearch
Currently while searching for current page in the sorted entry table of the page there is a out of bound access. Added a bound check to fix the error.
Dave:
Set return code to -EIO
CVE-2023-52601:
In the Linux kernel, the following vulnerability has been resolved:
jfs: fix array-index-out-of-bounds in dbAdjTree
Currently there is a bound check missing in the dbAdjTree while accessing the dmt_stree. To add the required check added the bool is_ctl which is required to determine the size as suggest in the following commit.
https://lore.kernel.org/linux-kernel-mentees/[email protected]/
CVE-2023-52448:
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Fix kernel NULL pointer dereference in gfs2_rgrp_dump
Syzkaller has reported a NULL pointer dereference when accessing rgd->rd_rgl in gfs2_rgrp_dump(). This can happen when creating rgd->rd_gl fails in read_rindex_entry(). Add a NULL pointer check in gfs2_rgrp_dump() to prevent that.
CVE-2023-52444:
In the Linux kernel, the following vulnerability has been resolved: f2fs: fix to avoid dirent corruption As Al reported in link[1]: f2fs_rename() ... if (old_dir != new_dir && !whiteout) f2fs_set_link(old_inode, old_dir_entry, old_dir_page, new_dir); else f2fs_put_page(old_dir_page, 0); You want correct inumber in the .. link. And cross-directory rename does move the source to new parent, even if you'd been asked to leave a whiteout in the old place. [1] https://lore.kernel.org/all/20231017055040.GN800259@ZenIV/ With below testcase, it may cause dirent corruption, due to it missed to call f2fs_set_link() to update ..
link to new directory. - mkdir -p dir/foo - renameat2 -w dir/foo bar [ASSERT] (__chk_dots_dentries:1421)
--> Bad inode number[0x4] for '..', parent parent ino is [0x3] [FSCK] other corrupted bugs [Fail]
CVE-2023-52436:
In the Linux kernel, the following vulnerability has been resolved:
f2fs: explicitly null-terminate the xattr list
When setting an xattr, explicitly null-terminate the xattr list. This eliminates the fragile assumption that the unused xattr space is always zeroed.
CVE-2023-45896:
ntfs3 in the Linux kernel before 6.5.11 allows a physically proximate attacker to read kernel memory by mounting a filesystem (e.g., if a Linux distribution is configured to allow unprivileged mounts of removable media) and then leveraging local access to trigger an out-of-bounds read. A length value can be larger than the amount of memory allocated. NOTE: the supplier's perspective is that there is no vulnerability when an attack requires an attacker-modified filesystem image.
CVE-2023-1611:
A use-after-free flaw was found in btrfs_search_slot in fs/btrfs/ctree.c in btrfs in the Linux Kernel.This flaw allows an attacker to crash the system and possibly cause a kernel information lea
CVE-2022-48931:
In the Linux kernel, the following vulnerability has been resolved:
configfs: fix a race in configfs_{,un}register_subsystem()
When configfs_register_subsystem() or configfs_unregister_subsystem() is executing link_group() or unlink_group(), it is possible that two processes add or delete list concurrently.
Some unfortunate interleavings of them can cause kernel panic.
One of cases is:
A --> B --> C --> D A <-- B <-- C <-- D
delete list_head *B | delete list_head *C
--------------------------------|----------------------------------- configfs_unregister_subsystem | configfs_unregister_subsystem unlink_group | unlink_group unlink_obj | unlink_obj list_del_init | list_del_init
__list_del_entry | __list_del_entry
__list_del | __list_del // next == C | next->prev = prev | | next->prev = prev prev->next = next | | // prev == B | prev->next = next
Fix this by adding mutex when calling link_group() or unlink_group(), but parent configfs_subsystem is NULL when config_item is root.
So I create a mutex configfs_subsystem_mutex.
CVE-2022-48923:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: prevent copying too big compressed lzo segment
Compressed length can be corrupted to be a lot larger than memory we have allocated for buffer.
This will cause memcpy in copy_compressed_segment to write outside of allocated memory.
This mostly results in stuck read syscall but sometimes when using btrfs send can get #GP
kernel: general protection fault, probably for non-canonical address 0x841551d5c1000: 0000 [#1] PREEMPT SMP NOPTI kernel: CPU: 17 PID: 264 Comm: kworker/u256:7 Tainted: P OE 5.17.0-rc2-1 #12 kernel: Workqueue: btrfs-endio btrfs_work_helper [btrfs] kernel: RIP: 0010:lzo_decompress_bio (./include/linux/fortify-string.h:225 fs/btrfs/lzo.c:322 fs/btrfs/lzo.c:394) btrfs Code starting with the faulting instruction =========================================== 0:* 48 8b 06 mov (%rsi),%rax <-- trapping instruction 3: 48 8d 79 08 lea 0x8(%rcx),%rdi 7: 48 83 e7 f8 and $0xfffffffffffffff8,%rdi b: 48 89 01 mov %rax,(%rcx) e: 44 89 f0 mov %r14d,%eax 11: 48 8b 54 06 f8 mov -0x8(%rsi,%rax,1),%rdx kernel: RSP: 0018:ffffb110812efd50 EFLAGS: 00010212 kernel: RAX: 0000000000001000 RBX: 000000009ca264c8 RCX: ffff98996e6d8ff8 kernel: RDX: 0000000000000064 RSI: 000841551d5c1000 RDI: ffffffff9500435d kernel: RBP: ffff989a3be856c0 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000001000 R12: ffff98996e6d8000 kernel: R13: 0000000000000008 R14: 0000000000001000 R15: 000841551d5c1000 kernel: FS: 0000000000000000(0000) GS:ffff98a09d640000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 00001e9f984d9ea8 CR3: 000000014971a000 CR4: 00000000003506e0 kernel: Call Trace:
kernel: <TASK> kernel: end_compressed_bio_read (fs/btrfs/compression.c:104 fs/btrfs/compression.c:1363 fs/btrfs/compression.c:323) btrfs kernel: end_workqueue_fn (fs/btrfs/disk-io.c:1923) btrfs kernel: btrfs_work_helper (fs/btrfs/async-thread.c:326) btrfs kernel: process_one_work (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:212 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2312) kernel: worker_thread (./include/linux/list.h:292 kernel/workqueue.c:2455) kernel: ? process_one_work (kernel/workqueue.c:2397) kernel: kthread (kernel/kthread.c:377) kernel: ? kthread_complete_and_exit (kernel/kthread.c:332) kernel: ret_from_fork (arch/x86/entry/entry_64.S:301) kernel: </TASK>
CVE-2022-48902:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: do not WARN_ON() if we have PageError set
Whenever we do any extent buffer operations we call assert_eb_page_uptodate() to complain loudly if we're operating on an non-uptodate page. Our overnight tests caught this warning earlier this week
WARNING: CPU: 1 PID: 553508 at fs/btrfs/extent_io.c:6849 assert_eb_page_uptodate+0x3f/0x50 CPU: 1 PID: 553508 Comm: kworker/u4:13 Tainted: G W 5.17.0-rc3+ #564 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Workqueue: btrfs-cache btrfs_work_helper RIP: 0010:assert_eb_page_uptodate+0x3f/0x50 RSP: 0018:ffffa961440a7c68 EFLAGS: 00010246 RAX: 0017ffffc0002112 RBX: ffffe6e74453f9c0 RCX: 0000000000001000 RDX: ffffe6e74467c887 RSI: ffffe6e74453f9c0 RDI: ffff8d4c5efc2fc0 RBP: 0000000000000d56 R08: ffff8d4d4a224000 R09: 0000000000000000 R10: 00015817fa9d1ef0 R11: 000000000000000c R12: 00000000000007b1 R13: ffff8d4c5efc2fc0 R14: 0000000001500000 R15: 0000000001cb1000 FS: 0000000000000000(0000) GS:ffff8d4dbbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff31d3448d8 CR3: 0000000118be8004 CR4: 0000000000370ee0 Call Trace:
extent_buffer_test_bit+0x3f/0x70 free_space_test_bit+0xa6/0xc0 load_free_space_tree+0x1f6/0x470 caching_thread+0x454/0x630 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? rcu_read_lock_sched_held+0x12/0x60 ? lock_release+0x1f0/0x2d0 btrfs_work_helper+0xf2/0x3e0 ? lock_release+0x1f0/0x2d0 ? finish_task_switch.isra.0+0xf9/0x3a0 process_one_work+0x26d/0x580 ? process_one_work+0x580/0x580 worker_thread+0x55/0x3b0 ? process_one_work+0x580/0x580 kthread+0xf0/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30
This was partially fixed by c2e39305299f01 (btrfs: clear extent buffer uptodate when we fail to write it), however all that fix did was keep us from finding extent buffers after a failed writeout. It didn't keep us from continuing to use a buffer that we already had found.
In this case we're searching the commit root to cache the block group, so we can start committing the transaction and switch the commit root and then start writing. After the switch we can look up an extent buffer that hasn't been written yet and start processing that block group. Then we fail to write that block out and clear Uptodate on the page, and then we start spewing these errors.
Normally we're protected by the tree lock to a certain degree here. If we read a block we have that block read locked, and we block the writer from locking the block before we submit it for the write. However this isn't necessarily fool proof because the read could happen before we do the submit_bio and after we locked and unlocked the extent buffer.
Also in this particular case we have path->skip_locking set, so that won't save us here. We'll simply get a block that was valid when we read it, but became invalid while we were using it.
What we really want is to catch the case where we've read a block but it's not marked Uptodate. On read we ClearPageError(), so if we're !Uptodate and !Error we know we didn't do the right thing for reading the page.
Fix this by checking !Uptodate && !Error, this way we will not complain if our buffer gets invalidated while we're using it, and we'll maintain the spirit of the check which is to make sure we have a fully in-cache block while we're messing with it.
CVE-2022-48833:
In the Linux kernel, the following vulnerability has been resolved:
btrfs: skip reserved bytes warning on unmount after log cleanup failure
After the recent changes made by commit c2e39305299f01 (btrfs: clear extent buffer uptodate when we fail to write it) and its followup fix, commit 651740a5024117 (btrfs: check WRITE_ERR when trying to read an extent buffer), we can now end up not cleaning up space reservations of log tree extent buffers after a transaction abort happens, as well as not cleaning up still dirty extent buffers.
This happens because if writeback for a log tree extent buffer failed, then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on, when trying to free the log tree with free_log_tree(), which iterates over the tree, we can end up getting an -EIO error when trying to read a node or a leaf, since read_extent_buffer_pages() returns -EIO if an extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return immediately as we can not iterate over the entire tree.
In that case we never update the reserved space for an extent buffer in the respective block group and space_info object.
When this happens we get the following traces when unmounting the fs:
[174957.284509] BTRFS: error (device dm-0) in cleanup_transaction:1913: errno=-5 IO failure [174957.286497] BTRFS: error (device dm-0) in free_log_tree:3420: errno=-5 IO failure [174957.399379] ------------[ cut here ]------------ [174957.402497] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:127 btrfs_put_block_group+0x77/0xb0 [btrfs] [174957.407523] Modules linked in: btrfs overlay dm_zero (...) [174957.424917] CPU: 2 PID: 3206883 Comm: umount Tainted: G W 5.16.0-rc5-btrfs-next-109 #1 [174957.426689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [174957.428716] RIP: 0010:btrfs_put_block_group+0x77/0xb0 [btrfs] [174957.429717] Code: 21 48 8b bd (...) [174957.432867] RSP: 0018:ffffb70d41cffdd0 EFLAGS: 00010206 [174957.433632] RAX: 0000000000000001 RBX: ffff8b09c3848000 RCX: ffff8b0758edd1c8 [174957.434689] RDX: 0000000000000001 RSI: ffffffffc0b467e7 RDI: ffff8b0758edd000 [174957.436068] RBP: ffff8b0758edd000 R08: 0000000000000000 R09: 0000000000000000 [174957.437114] R10: 0000000000000246 R11: 0000000000000000 R12: ffff8b09c3848148 [174957.438140] R13: ffff8b09c3848198 R14: ffff8b0758edd188 R15: dead000000000100 [174957.439317] FS: 00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000 [174957.440402] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [174957.441164] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0 [174957.442117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [174957.443076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [174957.443948] Call Trace:
[174957.444264] <TASK> [174957.444538] btrfs_free_block_groups+0x255/0x3c0 [btrfs] [174957.445238] close_ctree+0x301/0x357 [btrfs] [174957.445803] ? call_rcu+0x16c/0x290 [174957.446250] generic_shutdown_super+0x74/0x120 [174957.446832] kill_anon_super+0x14/0x30 [174957.447305] btrfs_kill_super+0x12/0x20 [btrfs] [174957.447890] deactivate_locked_super+0x31/0xa0 [174957.448440] cleanup_mnt+0x147/0x1c0 [174957.448888] task_work_run+0x5c/0xa0 [174957.449336] exit_to_user_mode_prepare+0x1e5/0x1f0 [174957.449934] syscall_exit_to_user_mode+0x16/0x40 [174957.450512] do_syscall_64+0x48/0xc0 [174957.450980] entry_SYSCALL_64_after_hwframe+0x44/0xae [174957.451605] RIP: 0033:0x7f328fdc4a97 [174957.452059] Code: 03 0c 00 f7 (...) [174957.454320] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [174957.455262] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97 [174957.456131] RDX: 0000000000000000 RSI: 00000000000000
---truncated---
CVE-2022-48829:
In the Linux kernel, the following vulnerability has been resolved:
NFSD: Fix NFSv3 SETATTR/CREATE's handling of large file sizes
iattr::ia_size is a loff_t, so these NFSv3 procedures must be careful to deal with incoming client size values that are larger than s64_max without corrupting the value.
Silently capping the value results in storing a different value than the client passed in which is unexpected behavior, so remove the min_t() check in decode_sattr3().
Note that RFC 1813 permits only the WRITE procedure to return NFS3ERR_FBIG. We believe that NFSv3 reference implementations also return NFS3ERR_FBIG when ia_size is too large.
CVE-2022-48828:
In the Linux kernel, the following vulnerability has been resolved:
NFSD: Fix ia_size underflow
iattr::ia_size is a loff_t, which is a signed 64-bit type. NFSv3 and NFSv4 both define file size as an unsigned 64-bit type. Thus there is a range of valid file size values an NFS client can send that is already larger than Linux can handle.
Currently decode_fattr4() dumps a full u64 value into ia_size. If that value happens to be larger than S64_MAX, then ia_size underflows. I'm about to fix up the NFSv3 behavior as well, so let's catch the underflow in the common code path: nfsd_setattr().
CVE-2022-48827:
In the Linux kernel, the following vulnerability has been resolved:
NFSD: Fix the behavior of READ near OFFSET_MAX
Dan Aloni reports:
> Due to commit 8cfb9015280d (NFS: Always provide aligned buffers to > the RPC read layers) on the client, a read of 0xfff is aligned up > to server rsize of 0x1000.
> > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server > and it returns an NFS code of EINVAL to the client. The client as > a result indefinitely retries the request.
The Linux NFS client does not handle NFS?ERR_INVAL, even though all NFS specifications permit servers to return that status code for a READ.
Instead of NFS?ERR_INVAL, have out-of-range READ requests succeed and return a short result. Set the EOF flag in the result to prevent the client from retrying the READ request. This behavior appears to be consistent with Solaris NFS servers.
Note that NFSv3 and NFSv4 use u64 offset values on the wire. These must be converted to loff_t internally before use -- an implicit type cast is not adequate for this purpose. Otherwise VFS checks against sb->s_maxbytes do not work properly.
CVE-2022-48793:
In the Linux kernel, the following vulnerability has been resolved:
KVM: x86: nSVM: fix potential NULL derefernce on nested migration
Turns out that due to review feedback and/or rebases I accidentally moved the call to nested_svm_load_cr3 to be too early, before the NPT is enabled, which is very wrong to do.
KVM can't even access guest memory at that point as nested NPT is needed for that, and of course it won't initialize the walk_mmu, which is main issue the patch was addressing.
Fix this for real.
CVE-2022-48767:
In the Linux kernel, the following vulnerability has been resolved:
ceph: properly put ceph_string reference after async create attempt
The reference acquired by try_prep_async_create is currently leaked.
Ensure we put it.
CVE-2022-48741:
In the Linux kernel, the following vulnerability has been resolved:
ovl: fix NULL pointer dereference in copy up warning
This patch is fixing a NULL pointer dereference to get a recently introduced warning message working.
CVE-2022-48740:
In the Linux kernel, the following vulnerability has been resolved:
selinux: fix double free of cond_list on error paths
On error path from cond_read_list() and duplicate_policydb_cond_list() the cond_list_destroy() gets called a second time in caller functions, resulting in NULL pointer deref. Fix this by resetting the cond_list_len to 0 in cond_list_destroy(), making subsequent calls a noop.
Also consistently reset the cond_list pointer to NULL after freeing.
[PM: fix line lengths in the description]
CVE-2022-36123:
The Linux kernel before 5.18.13 lacks a certain clear operation for the block starting symbol (.bss). This allows Xen PV guest OS users to cause a denial of service or gain privileges.
CVE-2022-2327:
io_uring use work_flags to determine which identity need to grab from the calling process to make sure it is consistent with the calling process when executing IORING_OP. Some operations are missing some types, which can lead to incorrect reference counts which can then lead to a double free. We recommend upgrading the kernel past commit df3f3bb5059d20ef094d6b2f0256c4bf4127a859
CVE-2022-1205:
A NULL pointer dereference flaw was found in the Linux kernel's Amateur Radio AX.25 protocol functionality in the way a user connects with the protocol. This flaw allows a local user to crash the system.
CVE-2022-1204:
A use-after-free flaw was found in the Linux kernel's Amateur Radio AX.25 protocol functionality in the way a user connects with the protocol. This flaw allows a local user to crash the system.
CVE-2022-0480:
A flaw was found in the filelock_init in fs/locks.c function in the Linux kernel. This issue can lead to host memory exhaustion due to memcg not limiting the number of Portable Operating System Interface (POSIX) file locks.
CVE-2021-47218:
In the Linux kernel, the following vulnerability has been resolved:
selinux: fix NULL-pointer dereference when hashtab allocation fails
When the hash table slot array allocation fails in hashtab_init(), h->size is left initialized with a non-zero value, but the h->htable pointer is NULL. This may then cause a NULL pointer dereference, since the policydb code relies on the assumption that even after a failed hashtab_init(), hashtab_map() and hashtab_destroy() can be safely called on it. Yet, these detect an empty hashtab only by looking at the size.
Fix this by making sure that hashtab_init() always leaves behind a valid empty hashtab when the allocation fails.
CVE-2021-4440:
In the Linux kernel, the following vulnerability has been resolved:
x86/xen: Drop USERGS_SYSRET64 paravirt call
commit afd30525a659ac0ae0904f0cb4a2ca75522c3123 upstream.
USERGS_SYSRET64 is used to return from a syscall via SYSRET, but a Xen PV guest will nevertheless use the IRET hypercall, as there is no sysret PV hypercall defined.
So instead of testing all the prerequisites for doing a sysret and then mangling the stack for Xen PV again for doing an iret just use the iret exit from the beginning.
This can easily be done via an ALTERNATIVE like it is done for the sysenter compat case already.
It should be noted that this drops the optimization in Xen for not restoring a few registers when returning to user mode, but it seems as if the saved instructions in the kernel more than compensate for this drop (a kernel build in a Xen PV guest was slightly faster with this patch applied).
While at it remove the stale sysret32 remnants.
[ pawan: Brad Spengler and Salvatore Bonaccorso <[email protected]> reported a problem with the 5.10 backport commit edc702b4a820 (x86/entry_64: Add VERW just before userspace transition).
When CONFIG_PARAVIRT_XXL=y, CLEAR_CPU_BUFFERS is not executed in syscall_return_via_sysret path as USERGS_SYSRET64 is runtime patched to:
.cpu_usergs_sysret64 = { 0x0f, 0x01, 0xf8, 0x48, 0x0f, 0x07 }, // swapgs; sysretq
which is missing CLEAR_CPU_BUFFERS. It turns out dropping USERGS_SYSRET64 simplifies the code, allowing CLEAR_CPU_BUFFERS to be explicitly added to syscall_return_via_sysret path. Below is with CONFIG_PARAVIRT_XXL=y and this patch applied:
syscall_return_via_sysret:
...
<+342>: swapgs <+345>: xchg %ax,%ax <+347>: verw -0x1a2(%rip) <------ <+354>: sysretq ]
CVE-2021-4095:
A NULL pointer dereference was found in the Linux kernel's KVM when dirty ring logging is enabled without an active vCPU context. An unprivileged local attacker on the host may use this flaw to cause a kernel oops condition and thus a denial of service by issuing a KVM_XEN_HVM_SET_ATTR ioctl. This flaw affects Linux kernel versions prior to 5.17-rc1.
CVE-2021-4032:
A vulnerability was found in the Linux kernel's KVM subsystem in arch/x86/kvm/lapic.c kvm_free_lapic when a failure allocation was detected. In this flaw the KVM subsystem may crash the kernel due to mishandling of memory errors that happens during VCPU construction, which allows an attacker with special user privilege to cause a denial of service. This flaw affects kernel versions prior to 5.15 rc7.
Tenable has extracted the preceding description block directly from the Tencent Linux security advisory.
Note that Nessus has not tested for these issues but has instead relied only on the application's self-reported version number.
Solution
Update the affected packages.