[Bug 2031383] Re: Performance issue on mdraid5 when the number of devices more than 4
Vladimir Khristenko
2031383 at bugs.launchpad.net
Tue Aug 15 12:09:30 UTC 2023
** Summary changed:
- RAID5 performance issue on mdraid5 when the number of devices more than 4
+ Performance issue on mdraid5 when the number of devices more than 4
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/2031383
Title:
Performance issue on mdraid5 when the number of devices more than 4
Status in mdadm package in Ubuntu:
New
Bug description:
Hi there.
I have encountered a significant increase in the max latency on the 4k
random write pattern on mdraid5 in the case when the number of devices
in the array becomes more than 4.
Environment:
OS: Ubuntu 20.04
kernel: 5.15.0-79 (HWE)
NVMe: 5x Solidigm D7-5620 1.6TB (FW: 9CV10410)
group_thread_cnt and stripe_cache_size parameters are set via the udev rules file:
cat /etc/udev/rules.d/60-md-stripe-cache.rules
SUBSYSTEM=="block", KERNEL=="md*", ACTION=="add|change", ATTR{md/group_thread_cnt}="6"
SUBSYSTEM=="block", KERNEL=="md*", ACTION=="add|change", ATTR{md/stripe_cache_size}="512"
mdraid5 on top of 4x NVMe drives:
#---------------
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 nvme3n1p1[3] nvme2n1p1[2] nvme1n1p1[1] nvme0n1p1[0]
4688040960 blocks super 1.2 level 5, 4k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 0/12 pages [0KB], 65536KB chunk
#---------------
Then run fio tests:
for i in {1..3}; do echo test "$i"; fio --name=nvme --numjobs=8 --iodepth=32 --bs=4k --rw=randwrite --ioengine=libaio --direct=1 --group_reporting=1 --filename=/dev/md0p1 --runtime=600 --time_based=1 --ramp_time=0; done
fio results:
Test 1:
...
write: IOPS=250k, BW=976MiB/s (1023MB/s)(572GiB/600002msec);
lat (usec): min=58, max=9519, avg=1024.02, stdev=1036.23
Test 2:
...
write: IOPS=291k, BW=1138MiB/s (1193MB/s)(667GiB/600002msec); 0 zone resets
lat (usec): min=43, max=19160, avg=878.25, stdev=820.79
Test 3:
...
write: IOPS=301k, BW=1176MiB/s (1233MB/s)(689GiB/600003msec); 0 zone resets
lat (usec): min=48, max=7900, avg=850.05, stdev=763.24
...
Max latency is 19160 usec (test 2).
mdraid5 on top of 5x NVMe drives:
#---------------
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 nvme4n1p1[4] nvme3n1p1[3] nvme2n1p1[2] nvme1n1p1[1] nvme0n1p1[0]
6250721280 blocks super 1.2 level 5, 4k chunk, algorithm 2 [5/5] [UUUUU]
bitmap: 10/12 pages [40KB], 65536KB chunk
#---------------
Running the same test:
for i in {1..3}; do echo test "$i"; fio --name=nvme --numjobs=8 --iodepth=32 --bs=4k --rw=randwrite --ioengine=libaio --direct=1 --group_reporting=1 --filename=/dev/md0p1 --runtime=600 --time_based=1 --ramp_time=0; done
fio results:
Test 1:
...
write: IOPS=375k, BW=1466MiB/s (1537MB/s)(859GiB/600002msec); 0 zone resets
lat (usec): min=78, max=28966k, avg=681.56, stdev=3300.12
Test 2:
...
write: IOPS=390k, BW=1524MiB/s (1598MB/s)(893GiB/600001msec); 0 zone resets
lat (usec): min=77, max=63847k, avg=655.85, stdev=6565.15
...
Test 3:
...
write: IOPS=391k, BW=1526MiB/s (1600MB/s)(894GiB/600002msec); 0 zone resets
lat (usec): min=79, max=60377k, avg=654.74, stdev=6081.22
...
Final:
mdraid5 on top of 4x NVMe drives: max latency - 19160 usec.
mdraid5 on top of 5x NVMe drives: max latency - 63847k usec.
As you can see the max latency is a significant increase to 63847k
usec (test 2).
If I increase the runtime to 3600/7200 sec, I have see a hung task in dmesg:
...
[11480.292296] INFO: task fio:2501 blocked for more than 120 seconds.
[11480.292320] Not tainted 5.15.0-79-generic #85-Ubuntu
[11480.292341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11480.292369] task:fio state:D stack: 0 pid: 2501 ppid: 2465 flags:0x00004002
...
To eliminate the problem with my NVMe drives, I have built an array on RAM drives and got the same behavior.
modprobe brd rd_nr=6 rd_size=10485760
mdraid5 on top of 3x RAM drives:
mdadm --create /dev/md0 --level=5 --chunk=4K --bitmap=internal --raid-devices=3 /dev/ram0 /dev/ram1 /dev/ram2
#---------------
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 ram2[3] ram1[1] ram0[0]
20953088 blocks super 1.2 level 5, 4k chunk, algorithm 2 [3/3] [UUU]
bitmap: 1/1 pages [4KB], 65536KB chunk
#---------------
for i in {1..3}; do echo test "$i"; date; fio --name=nvme --numjobs=16
--iodepth=32 --bs=4k --rw=randwrite --ioengine=libaio --direct=1
--group_reporting=1 --filename=/dev/md0 --runtime=600 --time_based=1
--ramp_time=0; done
fio results:
Test 1:
...
write: IOPS=497k, BW=1939MiB/s (2034MB/s)(1136GiB/600003msec); 0 zone resets
lat (usec): min=466, max=6171, avg=1030.71, stdev=39.32
...
Test 2:
...
write: IOPS=497k, BW=1941MiB/s (2035MB/s)(1137GiB/600003msec); 0 zone resets
lat (usec): min=461, max=6223, avg=1030.06, stdev=39.38
...
Test 3:
...
write: IOPS=497k, BW=1940MiB/s (2034MB/s)(1136GiB/600002msec); 0 zone resets
lat (usec): min=474, max=6179, avg=1030.68, stdev=39.29
...
Max latency is 6223 usec (test 2).
mdraid5 on top 4x RAM drives:
mdadm --create /dev/md0 --level=5 --chunk=4K --bitmap=internal --raid-devices=4 /dev/ram0 /dev/ram1 /dev/ram2 /dev/ram3
#---------------
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 ram3[4] ram2[2] ram1[1] ram0[0]
31429632 blocks super 1.2 level 5, 4k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 1/1 pages [4KB], 65536KB chunk
#---------------
for i in {1..3}; do echo test "$i"; date; fio --name=nvme --numjobs=16
--iodepth=32 --bs=4k --rw=randwrite --ioengine=libaio --direct=1
--group_reporting=1 --filename=/dev/md0 --runtime=600 --time_based=1
--ramp_time=0; done
fio results:
Test 1:
...
write: IOPS=438k, BW=1712MiB/s (1796MB/s)(1003GiB/600002msec); 0 zone resets
lat (usec): min=468, max=6902, avg=1167.45, stdev=46.17
...
Test 2:
...
write: IOPS=438k, BW=1711MiB/s (1794MB/s)(1002GiB/600004msec); 0 zone resets
lat (usec): min=470, max=7689, avg=1168.49, stdev=46.14
...
Test 3:
...
write: IOPS=438k, BW=1712MiB/s (1796MB/s)(1003GiB/600003msec); 0 zone resets
lat (usec): min=479, max=6376, avg=1167.40, stdev=46.18
...
Max latency is 7689 usec (test 2).
mdraid5 on top 5x RAM drives:
mdadm --create /dev/md0 --level=5 --chunk=4K --bitmap=internal --raid-devices=5 /dev/ram0 /dev/ram1 /dev/ram2 /dev/ram3 /dev/ram4
#---------------
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 ram4[5] ram3[3] ram2[2] ram1[1] ram0[0]
41906176 blocks super 1.2 level 5, 4k chunk, algorithm 2 [5/5] [UUUUU]
bitmap: 0/1 pages [0KB], 65536KB chunk
#---------------
for i in {1..3}; do echo test "$i"; date; fio --name=nvme --numjobs=16 --iodepth=32 --bs=4k --rw=randwrite --ioengine=libaio --direct=1 --group_reporting=1 --filename=/dev/md0 --runtime=600 --time_based=1 --ramp_time=0; done
fio results:
Test 1:
...
write: IOPS=452k, BW=1764MiB/s (1850MB/s)(1034GiB/600001msec); 0 zone resets
lat (usec): min=13, max=68868k, avg=1133.11, stdev=79882.97
...
Test 2:
...
write: IOPS=451k, BW=1763MiB/s (1849MB/s)(1033GiB/600001msec); 0 zone resets
lat (usec): min=11, max=45339k, avg=1134.04, stdev=78829.34
...
Test 3:
...
write: IOPS=453k, BW=1770MiB/s (1856MB/s)(1037GiB/600001msec); 0 zone resets
lat (usec): min=12, max=63593k, avg=1129.34, stdev=84268.37
...
Max latency is 68868k usec (test 1).
Final:
mdraid5 on top of 3x RAM drives: max latency - 6223 usec.
mdraid5 on top of 4x RAM drives: max latency - 7689 usec.
mdraid5 on top of 5x RAM drives: max latency - 68868k usec.
I also reproduced this behavior on mdraid4 and mdraid5 in CentOS 7,
CentOS 9, and Ubuntu 22.04 with kernels 5.15.0-79 and 6.4(mainline).
But I can't reproduce this behavior on mdraid6.
Could you please help me to understand why it happens and if there is any chance to fix that?
Let me know if you need more detailed information about my environment or needed to run more tests.
Thank you in advance.
ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: mdadm 4.1-5ubuntu1.2
ProcVersionSignature: Ubuntu 5.15.0-79.86~20.04.2-generic 5.15.111
Uname: Linux 5.15.0-79-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu27.27
Architecture: amd64
CasperMD5CheckResult: skip
Date: Tue Aug 15 08:54:39 2023
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:
Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MDadmExamine.dev.sda:
/dev/sda:
MBR Magic : aa55
Partition[0] : 62914559 sectors at 1 (type ee)
MDadmExamine.dev.sda1: Error: command ['/sbin/mdadm', '-E', '/dev/sda1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda1.
MDadmExamine.dev.sda2:
/dev/sda2:
MBR Magic : aa55
MDadmExamine.dev.sda3: Error: command ['/sbin/mdadm', '-E', '/dev/sda3'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda3.
MDadmExamine.dev.sda4: Error: command ['/sbin/mdadm', '-E', '/dev/sda4'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda4.
MachineType: VMware, Inc. VMware Virtual Platform
ProcEnviron:
LANGUAGE=en_US:
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-79-generic root=/dev/mapper/main-root ro quiet
ProcMounts: Error: [Errno 40] Too many levels of symbolic links: '/proc/mounts'
SourcePackage: mdadm
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/12/2020
dmi.bios.release: 4.6
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.ec.firmware.release: 0.0
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd11/12/2020:br4.6:efr0.0:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:sku:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/2031383/+subscriptions
More information about the foundations-bugs
mailing list