[Bug 1929591] Re: MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault

Matt Thompson 1929591 at bugs.launchpad.net
Mon Aug 2 15:44:15 UTC 2021


I am experiencing this crash on an AWS i3.metal instance using mdadm.

There appear to be upstream patches for this issue:

https://lore.kernel.org/linux-
raid/CAPhsuW6V4-ujDZJopCyAfTpLqDuW1bXX+SGgxqwnbFmR3uEWGQ at mail.gmail.com/T/

http://lkml.iu.edu/hypermail/linux/kernel/2107.1/04478.html

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1929591

Title:
  MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault

Status in mdadm package in Ubuntu:
  New

Bug description:
  Hello:
  Every few days I get a kernel panic on my Ubuntu Server 20.10 box, which was recently upgraded to a Ryzen 3700X. I have 7 WD Red Pro HDDs in a RAID 6 array with Linux MD, and they're all attached to a LSI 9211-8ik PCIe card. Motherboard is currently a Gigabyte B550M Aorus Pro. My Ubuntu install is running the latest 5.8.0-53 kernel.

  This is the 2nd hardware configuration with the exact same kernel panic text. Previously I had these HDDs directly connected to the SATA controller of a ASRock X570 Pro4 ATX mobo with the same 3700X. I was also previously using Ubuntu Server 20.04 LTS -- I had upgraded to 20.10 in hopes that the newer kernel would fix it, which it did not.
   
  I had posted a whole story on StackOverflow about this journey if you're interested: https://superuser.com/questions/1615400/md-raid-6-periodic-kernel-panic-possible-kernel-bug 

  However, I am now convinced this is a Linux kernel bug in the MD
  driver.

  Example 1 kernel panic:

  [406005.583315] BUG: stack guard page was hit at 000000007cbff150 (stack is 000000003b7072a2..00000000dac5ed08)
  [406005.583315] kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
  [406005.583315] CPU: 15 PID: 514 Comm: md0_raid6 Tainted: P           OE     5.8.0-36-generic #40-Ubuntu
  [406005.583316] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO/B550M AORUS PRO, BIOS F1 05/19/2020
  [406005.583316] RIP: 0010:slab_free_freelist_hook+0x35/0x120
  [406005.583316] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
  [406005.583316] RSP: 0018:ffffa620c06e3ff8 EFLAGS: 00010246
  [406005.583317] RAX: ffff9aaf36f54720 RBX: ffff9ab34b407800 RCX: 0000000000000001
  [406005.583317] RDX: ffffa620c06e4040 RSI: ffffa620c06e4038 RDI: ffff9ab34b407800
  [406005.583317] RBP: ffffa620c06e4028 R08: 0000000000000001 R09: ffffffffb9c54500
  [406005.583318] R10: ffff9aaf36f54fe0 R11: 0000000000000001 R12: ffffa620c06e4038
  [406005.583318] R13: ffffa620c06e4040 R14: ffff9aaf36f54720 R15: ffff9ab2925cbd10
  [406005.583318] FS:  0000000000000000(0000) GS:ffff9ab34edc0000(0000) knlGS:0000000000000000
  [406005.583318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [406005.583318] CR2: ffffa620c06e3fe8 CR3: 00000005d52ac000 CR4: 0000000000340ee0
  [406005.583319] Call Trace:
  [406005.583319]  ? mempool_kfree+0xe/0x10
  [406005.583319]  ? kfree+0xb8/0x220
  [406005.583319]  ? mempool_kfree+0xe/0x10
  [406005.583319]  ? mempool_free+0x2f/0x80
  [406005.583319]  ? md_end_io+0x4b/0x70
  [406005.583319]  ? bio_endio+0xe6/0x150


  Example 2 kernel panic with old mobo:

  [161342.301305] BUG: stack guard page was hit at 00000000fc60f228 (stack is 00000000875efe77..000000003f38a379)
  [161342.301306] kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
  [161342.301306] CPU: 10 PID: 465 Comm: md0_raid6 Tainted: P           OE     5.8.0-33-generic #36-Ubuntu
  [161342.301307] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.60 12/01/2020
  [161342.301307] RIP: 0010:slab_free_freelist_hook+0x35/0x120
  [161342.301308] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
  [161342.301308] RSP: 0018:ffffa86b00c6fff8 EFLAGS: 00010246
  [161342.301309] RAX: ffff98edc21cac40 RBX: ffff98ef0b407800 RCX: 0000000000000001
  [161342.301310] RDX: ffffa86b00c70040 RSI: ffffa86b00c70038 RDI: ffff98ef0b407800
  [161342.301310] RBP: ffffa86b00c70028 R08: 0000000000000001 R09: ffffffff85854500
  [161342.301311] R10: ffff98edc21ca100 R11: 0000000000000001 R12: ffffa86b00c70038
  [161342.301311] R13: ffffa86b00c70040 R14: ffff98edc21cac40 R15: ffff98e9b53d74d8
  [161342.301311] FS:  0000000000000000(0000) GS:ffff98ef0ec80000(0000) knlGS:0000000000000000
  [161342.301312] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [161342.301312] CR2: ffffa86b00c6ffe8 CR3: 00000007fa766000 CR4: 0000000000340ee0
  [161342.301312] Call Trace:
  [161342.301313]  ? mempool_kfree+0xe/0x10
  [161342.301313]  ? kfree+0xb8/0x220
  [161342.301313]  ? mempool_kfree+0xe/0x10
  [161342.301313]  ? mempool_free+0x2f/0x80
  [161342.301314]  ? md_end_io+0x4b/0x70
  [161342.301314]  ? bio_endio+0xe6/0x150
  [161342.301314]  ? bio_chain_endio+0x2d/0x40
  [161342.301315]  ? md_end_io+0x5d/0x70
  [161342.301315]  ? bio_endio+0xe6/0x150
  [161342.301315]  ? bio_chain_endio+0x2d/0x40
  [161342.301315]  ? md_end_io+0x5d/0x70
  [161342.301316]  ? bio_endio+0xe6/0x150
  [161342.301316]  ? bio_chain_endio+0x2d/0x40
  [161342.301316]  ? md_end_io+0x5d/0x70
  [161342.301316]  ? bio_endio+0xe6/0x150
  [161342.301317]  ? bio_chain_endio+0x2d/0x40
  [161342.301317]  ? md_end_io+0x5d/0x70
  [161342.301317]  ? bio_endio+0xe6/0x150
  [161342.301317]  ? bio_chain_endio+0x2d/0x40
  ...
  [161342.301379]  ? md_end_io+0x5d/0x70
  [161342.301379]  ? bio_endio+0xe6/0x150
  [161342.301380]  ? bio_chain_endio+0x2d/0x40
  [161342.301380]  ? md_end_io+0x5d/0x70
  [161342.301380]  ? bio_endio+0xe6/0x150
  [161342.301380]  ? bio_ch
  [161342.301381] Lost 296 message(s)!
  [    0.000000] Linux version 5.8.0-33-generic (buildd at lgw01-amd64-036) (gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35.1) #36-Ubuntu SMP Wed Dec 9 09:14:40 UTC 2020 (Ubuntu 5.8.0-33.36-generic 5.8.17)

  
  I can provide newer kernel panics or other info if needed. Thanks!

  ProblemType: Bug
  DistroRelease: Ubuntu 20.10
  Package: mdadm 4.1-5ubuntu5
  ProcVersionSignature: Ubuntu 5.8.0-53.60-generic 5.8.18
  Uname: Linux 5.8.0-53-generic x86_64
  NonfreeKernelModules: nvidia_modeset nvidia
  ApportVersion: 2.20.11-0ubuntu50.5
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Tue May 25 12:11:44 2021
  InstallationDate: Installed on 2020-11-23 (182 days ago)
  InstallationMedia: Ubuntu-Server 20.10 "Groovy Gorilla" - Release amd64 (20201022)
  MachineType: Gigabyte Technology Co., Ltd. B550M AORUS PRO
  ProcEnviron:
   TERM=screen-256color
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.8.0-53-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro console=tty1 console=ttyS0,115200 processor.max_cstate=5 rcu_nocbs=0-15
  SourcePackage: mdadm
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 05/19/2020
  dmi.bios.release: 5.17
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: F1
  dmi.board.asset.tag: Default string
  dmi.board.name: B550M AORUS PRO
  dmi.board.vendor: Gigabyte Technology Co., Ltd.
  dmi.board.version: x.x
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 3
  dmi.chassis.vendor: Default string
  dmi.chassis.version: Default string
  dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF1:bd05/19/2020:br5.17:svnGigabyteTechnologyCo.,Ltd.:pnB550MAORUSPRO:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB550MAORUSPRO:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
  dmi.product.family: Default string
  dmi.product.name: B550M AORUS PRO
  dmi.product.sku: Default string
  dmi.product.version: Default string
  dmi.sys.vendor: Gigabyte Technology Co., Ltd.
  etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
  mtime.conffile..etc.apport.crashdb.conf: 2020-11-24T13:52:10.563946

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1929591/+subscriptions




More information about the foundations-bugs mailing list