<div dir="ltr"><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Wed, Feb 12, 2025 at 9:36 AM Koichiro Den <<a href="mailto:koichiro.den@canonical.com">koichiro.den@canonical.com</a>> wrote: </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, Feb 12, 2025 at 09:14:27AM GMT, Heitor Alves de Siqueira wrote: > Hi Koichiro, > > thanks for looking into this! Yes, I've used the attached scripts to > reproduce the issue successfully, although only in aarch64 systems > (specifically, I've used Grace-Grace for my tests). > I've not been able to reproduce this reliably in x86 or other > architectures, and using 64k page sizes also makes this much faster/easier > to reproduce. Thanks for the reply. Just let me confirm; when you verified that you reproduced it, you confirmed that there were large number of dirty folios in the LRU list for the coldest gen for FILE (not ANON), right?</blockquote><div> </div><div>Here's a stack trace from the latest reproducer run I did earlier this morning, using kernel 6.8.0-53-generic-64k from Noble:</div><div> </div><div>[ 124.550628] alloc_and_crash: page allocation failure: order:0, mode:0x141cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP|__GFP_WRITE), nodemask=0,cpuset=/,mems_allowed=0-1ion 0:0x00000000221c0000 [ 124.550648] CPU: 135 PID: 3406 Comm: alloc_and_crash Not tainted 6.8.0-53-generic-64k #55-Ubuntu [ 124.550651] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 1.0c 12/28/2023 [ 124.550653] Call trace: [ 124.550656] dump_backtrace+0xa4/0x150 [ 124.550665] show_stack+0x24/0x50 [ 124.550667] dump_stack_lvl+0xc8/0x138 [ 124.550671] dump_stack+0x1c/0x38 [ 124.550672] warn_alloc+0x16c/0x1f0 [ 124.550677] __alloc_pages_slowpath.constprop.0+0x8e4/0x9f0 [ 124.550679] __alloc_pages+0x2f0/0x3a8 [ 124.550680] alloc_pages_mpol+0x94/0x290 [ 124.550685] alloc_pages+0x6c/0x118 [ 124.550687] folio_alloc+0x24/0x98 [ 124.550689] filemap_alloc_folio+0x168/0x188 [ 124.550692] __filemap_get_folio+0x1bc/0x3f8 [ 124.550694] ext4_da_write_begin+0x144/0x300 [ 124.550697] generic_perform_write+0xc4/0x228 [ 124.550699] ext4_buffered_write_iter+0x78/0x180 [ 124.550701] ext4_file_write_iter+0x44/0xf0 [ 124.550702] __kernel_write_iter+0x10c/0x2c0 [ 124.550704] dump_user_range+0xe0/0x240 [ 124.550707] elf_core_dump+0x4cc/0x538 [ 124.550709] do_coredump+0x574/0x988 [ 124.550711] get_signal+0x7dc/0x8f0 [ 124.550713] do_signal+0x138/0x1f8 [ 124.550715] do_notify_resume+0x114/0x298 [ 124.550716] el0_da+0xdc/0x178 [ 124.550719] el0t_64_sync_handler+0xdc/0x158 [ 124.550721] el0t_64_sync+0x1b0/0x1b8 [ 124.550723] Mem-Info: [ 124.550728] active_anon:3921 inactive_anon:3473262 isolated_anon:0 active_file:933 inactive_file:252531 isolated_file:0 unevictable:609 dirty:241262 writeback:0 slab_reclaimable:9234 slab_unreclaimable:35922 mapped:3472425 shmem:3474488 pagetables:624 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:4031494 free_pcp:0 free_cma:48 [ 124.550733] Node 0 active_anon:206656kB inactive_anon:222288768kB active_file:1728kB inactive_file:15437504kB unevictable:9024kB isolated(anon):0kB isolated(file):0kB mapped:222210880kB dirty:15437568kB writeback:0kB shmem:222337216kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0 kB kernel_stack:51584kB shadow_call_stack:66368kB pagetables:38016kB sec_pagetables:0kB all_unreclaimable? yes [ 124.550738] Node 0 DMA free:1041984kB boost:0kB min:69888kB low:87360kB high:104832kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:393472kB unevictable:0kB writepending:394112kB present:2097152kB managed:2029632kB mlocked:0kB bounce:0kB free_pcp:0kB loca l_pcp:0kB free_cma:3072kB [ 124.550742] lowmem_reserve[]: 0 0 15189 15189 15189 [ 124.550747] Node 0 Normal free:8574848kB boost:0kB min:8575808kB low:10719744kB high:12863680kB reserved_highatomic:0KB active_anon:206656kB inactive_anon:222288768kegion 0:0x0000000022580000 B active_file:1728kB inactive_file:15044032kB unevictable:9024kB writepending:15043456kB present:249244544kB managed:248932800kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB[ 124.550750] lowmem_reserve[]: 0 0 0 0 0 [ 124.550754] Node 0 DMA: 5*64kB (ME) 4*128kB (ME) 1*256kB (U) 7*512kB (UE) 5*1024kB (UMEC) 2*2048kB (UC) 3*4096kB (UME) 2*8192kB (ME) 3*16384kB (UME) 3*32768kB (UME) 1*65536kB (U) 2*131072kB (UE) 2*262144kB (UE) 0*524288kB = 1041984kB [ 124.550769] Node 0 Normal: 726*64kB (UME) 392*128kB (UME) 246*256kB (UE) 138*512kB (UME) 65*1024kB (UE) 48*2048kB (UME) 19*4096kB (UE) 7*8192kB (UME) 5*16384kB (U) 3*32768kB (UM) 2*65536kB (ME) 1*131072kB (E) 1*262144kB (M) 14*524288kB (M) = 8574848kB [ 124.550786] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB [ 124.550788] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB [ 124.550789] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 124.550790] 3729522 total pagecache pages [ 124.550792] 1406 pages in swap cache [ 124.550793] Free swap = 0kB [ 124.550794] Total swap = 8388544kB [ 124.550795] 7858556 pages RAM [ 124.550796] 0 pages HighMem/MovableOnly [ 124.550796] 12342 pages reserved [ 124.550797] 8192 pages cma reserved [ 124.550798] 0 pages hwpoisoned </div><div> </div><div>And here's /proc/meminfo from just before the crash:</div><div>MemTotal: 502157696 kB MemFree: 258273600 kB MemAvailable: 236229312 kB Buffers: 29632 kB Cached: 237187456 kB SwapCached: 1374848 kB Active: 9878912 kB Inactive: 228723776 kB Active(anon): 1307520 kB Inactive(anon): 227959296 kB Active(file): 8571392 kB Inactive(file): 764480 kB Unevictable: 38976 kB Mlocked: 29952 kB SwapTotal: 8388544 kB SwapFree: 5436224 kB Zswap: 0 kB Zswapped: 0 kB Dirty: 8519168 kB Writeback: 1250368 kB AnonPages: 79424 kB Mapped: 227857920 kB Shmem: 227861632 kB KReclaimable: 423680 kB Slab: 2767232 kB SReclaimable: 423680 kB SUnreclaim: 2343552 kB KernelStack: 93440 kB ShadowCallStack: 121088 kB PageTables: 40640 kB SecPageTables: 0 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 259467392 kB Committed_AS: 231067456 kB VmallocTotal: 137168158720 kB VmallocUsed: 567680 kB VmallocChunk: 0 kB Percpu: 156672 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 524288 kB CmaFree: 3072 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 524288 kB Hugetlb: 0 kB</div><div> </div><div>So while the number of ANON is much higher (due to how we setup the reproducer), we can still cause the page allocation failures with enough pressure on the LRU lists.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Could you answer the rest of my questions in the previous email? </blockquote><div> </div><div>Sure!</div><div>I did use those scripts on the LP bug to reproduce it successfully, with the caveats I mentioned previously (only on aarch64, and easier on 64k pages).</div><div>I landed on the mentioned fix commit by bisecting the upstream kernel (Linus' tree), and confirmed the issue does not happen when cherry-picking commit 1bc542c6a0d1 into Ubuntu kernels. I've validated this for Noble, Oracular and Plucky.</div><div> </div><div>Let me know if you need any more info on this!</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> > [...] > Also, did you confirm that the issue was resolved after applying the patch > for Noble/Oracular/Plucky? It seems to me that it's just stressing lru > list for ANON, not FILE. > > On Wed, Feb 12, 2025 at 1:37 AM Koichiro Den <<a href="mailto:koichiro.den@canonical.com" target="_blank">koichiro.den@canonical.com</a>> > wrote: > > > On Sun, Feb 02, 2025 at 12:21:50PM GMT, Heitor Alves de Siqueira wrote: > > > BugLink: <a href="https://bugs.launchpad.net/bugs/2097214" rel="noreferrer" target="_blank">https://bugs.launchpad.net/bugs/2097214</a> > > > > > > [Impact] > > > * On MGLRU-enabled systems, high memory pressure on NUMA nodes will > > cause page > > > allocation failures > > > * This happens due to page reclaim not waking up flusher threads > > > * OOM can be triggered even if the system has enough available memory > > > > > > [Test Plan] > > > * For the bug to properly trigger, we should uninstall apport and use > > the > > > attached alloc_and_crash.c reproducer > > > * alloc_and_crash will mmap a huge range of memory, memset it and > > forcibly SEGFAULT > > > * The attached bash script will membind alloc_and_crash to NUMA node 0, > > so we > > > can see the allocation failures in dmesg > > > $ sudo apt remove --purge apport > > > $ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg > > > > I looked over the attached files (alloc_and_crash.c and > > lp2097214-repro.sh). > > > > Question: > > Did you use them to reproduce the issue that you want to resolve here? > > Also, did you confirm that the issue was resolved after applying the patch > > for Noble/Oracular/Plucky? It seems to me that it's just stressing lru > > list for ANON, not FILE. > > > > > > > > [Fix] > > > * The upstream patch wakes up flusher threads if there are too many > > dirty > > > entries in the coldest LRU generation > > > * This happens when trying to shrink lruvecs, so reclaim only gets > > woken up > > > during high memory pressure > > > * Fix was introduced by commit: > > > 1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid > > cgroup OOM > > > > > > [Regression Potential] > > > * This commit fixes the memory reclaim path, so regressions would > > likely show > > > up during increased system memory pressure > > > * According to the upstream patch, increased SSD/disk wearing is > > possible due > > > to waking up flusher threads, although these have not been noted in > > testing > > > > > > Zeng Jingxiang (1): > > > mm/vmscan: wake up flushers conditionally to avoid cgroup OOM > > > > > > mm/vmscan.c | 25 ++++++++++++++++++++++--- > > > 1 file changed, 22 insertions(+), 3 deletions(-) > > > > > > -- > > > 2.48.1 > > > > > > > > > -- > > > kernel-team mailing list > > > <a href="mailto:kernel-team@lists.ubuntu.com" target="_blank">kernel-team@lists.ubuntu.com</a> > > > <a href="https://lists.ubuntu.com/mailman/listinfo/kernel-team" rel="noreferrer" target="_blank">https://lists.ubuntu.com/mailman/listinfo/kernel-team</a> > > </blockquote></div></div>