[Bug 1929591] Re: MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault
Matt Thompson
1929591 at bugs.launchpad.net
Mon Aug 2 15:44:15 UTC 2021
I am experiencing this crash on an AWS i3.metal instance using mdadm.
There appear to be upstream patches for this issue:
https://lore.kernel.org/linux-
raid/CAPhsuW6V4-ujDZJopCyAfTpLqDuW1bXX+SGgxqwnbFmR3uEWGQ at mail.gmail.com/T/
http://lkml.iu.edu/hypermail/linux/kernel/2107.1/04478.html
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1929591
Title:
MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault
Status in mdadm package in Ubuntu:
New
Bug description:
Hello:
Every few days I get a kernel panic on my Ubuntu Server 20.10 box, which was recently upgraded to a Ryzen 3700X. I have 7 WD Red Pro HDDs in a RAID 6 array with Linux MD, and they're all attached to a LSI 9211-8ik PCIe card. Motherboard is currently a Gigabyte B550M Aorus Pro. My Ubuntu install is running the latest 5.8.0-53 kernel.
This is the 2nd hardware configuration with the exact same kernel panic text. Previously I had these HDDs directly connected to the SATA controller of a ASRock X570 Pro4 ATX mobo with the same 3700X. I was also previously using Ubuntu Server 20.04 LTS -- I had upgraded to 20.10 in hopes that the newer kernel would fix it, which it did not.
I had posted a whole story on StackOverflow about this journey if you're interested: https://superuser.com/questions/1615400/md-raid-6-periodic-kernel-panic-possible-kernel-bug
However, I am now convinced this is a Linux kernel bug in the MD
driver.
Example 1 kernel panic:
[406005.583315] BUG: stack guard page was hit at 000000007cbff150 (stack is 000000003b7072a2..00000000dac5ed08)
[406005.583315] kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
[406005.583315] CPU: 15 PID: 514 Comm: md0_raid6 Tainted: P OE 5.8.0-36-generic #40-Ubuntu
[406005.583316] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO/B550M AORUS PRO, BIOS F1 05/19/2020
[406005.583316] RIP: 0010:slab_free_freelist_hook+0x35/0x120
[406005.583316] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[406005.583316] RSP: 0018:ffffa620c06e3ff8 EFLAGS: 00010246
[406005.583317] RAX: ffff9aaf36f54720 RBX: ffff9ab34b407800 RCX: 0000000000000001
[406005.583317] RDX: ffffa620c06e4040 RSI: ffffa620c06e4038 RDI: ffff9ab34b407800
[406005.583317] RBP: ffffa620c06e4028 R08: 0000000000000001 R09: ffffffffb9c54500
[406005.583318] R10: ffff9aaf36f54fe0 R11: 0000000000000001 R12: ffffa620c06e4038
[406005.583318] R13: ffffa620c06e4040 R14: ffff9aaf36f54720 R15: ffff9ab2925cbd10
[406005.583318] FS: 0000000000000000(0000) GS:ffff9ab34edc0000(0000) knlGS:0000000000000000
[406005.583318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[406005.583318] CR2: ffffa620c06e3fe8 CR3: 00000005d52ac000 CR4: 0000000000340ee0
[406005.583319] Call Trace:
[406005.583319] ? mempool_kfree+0xe/0x10
[406005.583319] ? kfree+0xb8/0x220
[406005.583319] ? mempool_kfree+0xe/0x10
[406005.583319] ? mempool_free+0x2f/0x80
[406005.583319] ? md_end_io+0x4b/0x70
[406005.583319] ? bio_endio+0xe6/0x150
Example 2 kernel panic with old mobo:
[161342.301305] BUG: stack guard page was hit at 00000000fc60f228 (stack is 00000000875efe77..000000003f38a379)
[161342.301306] kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
[161342.301306] CPU: 10 PID: 465 Comm: md0_raid6 Tainted: P OE 5.8.0-33-generic #36-Ubuntu
[161342.301307] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.60 12/01/2020
[161342.301307] RIP: 0010:slab_free_freelist_hook+0x35/0x120
[161342.301308] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[161342.301308] RSP: 0018:ffffa86b00c6fff8 EFLAGS: 00010246
[161342.301309] RAX: ffff98edc21cac40 RBX: ffff98ef0b407800 RCX: 0000000000000001
[161342.301310] RDX: ffffa86b00c70040 RSI: ffffa86b00c70038 RDI: ffff98ef0b407800
[161342.301310] RBP: ffffa86b00c70028 R08: 0000000000000001 R09: ffffffff85854500
[161342.301311] R10: ffff98edc21ca100 R11: 0000000000000001 R12: ffffa86b00c70038
[161342.301311] R13: ffffa86b00c70040 R14: ffff98edc21cac40 R15: ffff98e9b53d74d8
[161342.301311] FS: 0000000000000000(0000) GS:ffff98ef0ec80000(0000) knlGS:0000000000000000
[161342.301312] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[161342.301312] CR2: ffffa86b00c6ffe8 CR3: 00000007fa766000 CR4: 0000000000340ee0
[161342.301312] Call Trace:
[161342.301313] ? mempool_kfree+0xe/0x10
[161342.301313] ? kfree+0xb8/0x220
[161342.301313] ? mempool_kfree+0xe/0x10
[161342.301313] ? mempool_free+0x2f/0x80
[161342.301314] ? md_end_io+0x4b/0x70
[161342.301314] ? bio_endio+0xe6/0x150
[161342.301314] ? bio_chain_endio+0x2d/0x40
[161342.301315] ? md_end_io+0x5d/0x70
[161342.301315] ? bio_endio+0xe6/0x150
[161342.301315] ? bio_chain_endio+0x2d/0x40
[161342.301315] ? md_end_io+0x5d/0x70
[161342.301316] ? bio_endio+0xe6/0x150
[161342.301316] ? bio_chain_endio+0x2d/0x40
[161342.301316] ? md_end_io+0x5d/0x70
[161342.301316] ? bio_endio+0xe6/0x150
[161342.301317] ? bio_chain_endio+0x2d/0x40
[161342.301317] ? md_end_io+0x5d/0x70
[161342.301317] ? bio_endio+0xe6/0x150
[161342.301317] ? bio_chain_endio+0x2d/0x40
...
[161342.301379] ? md_end_io+0x5d/0x70
[161342.301379] ? bio_endio+0xe6/0x150
[161342.301380] ? bio_chain_endio+0x2d/0x40
[161342.301380] ? md_end_io+0x5d/0x70
[161342.301380] ? bio_endio+0xe6/0x150
[161342.301380] ? bio_ch
[161342.301381] Lost 296 message(s)!
[ 0.000000] Linux version 5.8.0-33-generic (buildd at lgw01-amd64-036) (gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35.1) #36-Ubuntu SMP Wed Dec 9 09:14:40 UTC 2020 (Ubuntu 5.8.0-33.36-generic 5.8.17)
I can provide newer kernel panics or other info if needed. Thanks!
ProblemType: Bug
DistroRelease: Ubuntu 20.10
Package: mdadm 4.1-5ubuntu5
ProcVersionSignature: Ubuntu 5.8.0-53.60-generic 5.8.18
Uname: Linux 5.8.0-53-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu50.5
Architecture: amd64
CasperMD5CheckResult: pass
Date: Tue May 25 12:11:44 2021
InstallationDate: Installed on 2020-11-23 (182 days ago)
InstallationMedia: Ubuntu-Server 20.10 "Groovy Gorilla" - Release amd64 (20201022)
MachineType: Gigabyte Technology Co., Ltd. B550M AORUS PRO
ProcEnviron:
TERM=screen-256color
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.8.0-53-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro console=tty1 console=ttyS0,115200 processor.max_cstate=5 rcu_nocbs=0-15
SourcePackage: mdadm
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/19/2020
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F1
dmi.board.asset.tag: Default string
dmi.board.name: B550M AORUS PRO
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF1:bd05/19/2020:br5.17:svnGigabyteTechnologyCo.,Ltd.:pnB550MAORUSPRO:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB550MAORUSPRO:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: B550M AORUS PRO
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
mtime.conffile..etc.apport.crashdb.conf: 2020-11-24T13:52:10.563946
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1929591/+subscriptions
More information about the foundations-bugs
mailing list